OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Alexey Kardashevskiy	46d3e1e162	vfio: powerpc/spapr: powerpc/powernv/ioda2: Use DMA windows API in ownership control Before the IOMMU user (VFIO) would take control over the IOMMU table belonging to a specific IOMMU group. This approach did not allow sharing tables between IOMMU groups attached to the same container. This introduces a new IOMMU ownership flavour when the user can not just control the existing IOMMU table but remove/create tables on demand. If an IOMMU implements take/release_ownership() callbacks, this lets the user have full control over the IOMMU group. When the ownership is taken, the platform code removes all the windows so the caller must create them. Before returning the ownership back to the platform code, VFIO unprograms and removes all the tables it created. This changes IODA2's onwership handler to remove the existing table rather than manipulating with the existing one. From now on, iommu_take_ownership() and iommu_release_ownership() are only called from the vfio_iommu_spapr_tce driver. Old-style ownership is still supported allowing VFIO to run on older P5IOC2 and IODA IO controllers. No change in userspace-visible behaviour is expected. Since it recreates TCE tables on each ownership change, related kernel traces will appear more often. This adds a pnv_pci_ioda2_setup_default_config() which is called when PE is being configured at boot time and when the ownership is passed from VFIO to the platform code. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:54 +10:00
Alexey Kardashevskiy	4793d65d1a	vfio: powerpc/spapr: powerpc/powernv/ioda: Define and implement DMA windows API This extends iommu_table_group_ops by a set of callbacks to support dynamic DMA windows management. create_table() creates a TCE table with specific parameters. it receives iommu_table_group to know nodeid in order to allocate TCE table memory closer to the PHB. The exact format of allocated multi-level table might be also specific to the PHB model (not the case now though). This callback calculated the DMA window offset on a PCI bus from @num and stores it in a just created table. set_window() sets the window at specified TVT index + @num on PHB. unset_window() unsets the window from specified TVT. This adds a free() callback to iommu_table_ops to free the memory (potentially a tree of tables) allocated for the TCE table. create_table() and free() are supposed to be called once per VFIO container and set_window()/unset_window() are supposed to be called for every group in a container. This adds IOMMU capabilities to iommu_table_group such as default 32bit window parameters and others. This makes use of new values in vfio_iommu_spapr_tce. IODA1/P5IOC2 do not support DDW so they do not advertise pagemasks to the userspace. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:52 +10:00
Alexey Kardashevskiy	05c6cfb9dc	powerpc/iommu/powernv: Release replaced TCE At the moment writing new TCE value to the IOMMU table fails with EBUSY if there is a valid entry already. However PAPR specification allows the guest to write new TCE value without clearing it first. Another problem this patch is addressing is the use of pool locks for external IOMMU users such as VFIO. The pool locks are to protect DMA page allocator rather than entries and since the host kernel does not control what pages are in use, there is no point in pool locks and exchange()+put_page(oldtce) is sufficient to avoid possible races. This adds an exchange() callback to iommu_table_ops which does the same thing as set() plus it returns replaced TCE and DMA direction so the caller can release the pages afterwards. The exchange() receives a physical address unlike set() which receives linear mapping address; and returns a physical address as the clear() does. This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement for a platform to have exchange() implemented in order to support VFIO. This replaces iommu_tce_build() and iommu_clear_tce() with a single iommu_tce_xchg(). This makes sure that TCE permission bits are not set in TCE passed to IOMMU API as those are to be calculated by platform code from DMA direction. This moves SetPageDirty() to the IOMMU code to make it work for both VFIO ioctl interface in in-kernel TCE acceleration (when it becomes available later). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:49 +10:00
Alexey Kardashevskiy	f87a88642e	vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership control This adds tce_iommu_take_ownership() and tce_iommu_release_ownership which call in a loop iommu_take_ownership()/iommu_release_ownership() for every table on the group. As there is just one now, no change in behaviour is expected. At the moment the iommu_table struct has a set_bypass() which enables/ disables DMA bypass on IODA2 PHB. This is exposed to POWERPC IOMMU code which calls this callback when external IOMMU users such as VFIO are about to get over a PHB. The set_bypass() callback is not really an iommu_table function but IOMMU/PE function. This introduces a iommu_table_group_ops struct and adds take_ownership()/release_ownership() callbacks to it which are called when an external user takes/releases control over the IOMMU. This replaces set_bypass() with ownership callbacks as it is not necessarily just bypass enabling, it can be something else/more so let's give it more generic name. The callbacks is implemented for IODA2 only. Other platforms (P5IOC2, IODA1) will use the old iommu_take_ownership/iommu_release_ownership API. The following patches will replace iommu_take_ownership/ iommu_release_ownership calls in IODA2 with full IOMMU table release/ create. As we here and touching bypass control, this removes pnv_pci_ioda2_setup_bypass_pe() as it does not do much more compared to pnv_pci_ioda2_set_bypass. This moves tce_bypass_base initialization to pnv_pci_ioda2_setup_dma_pe. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:47 +10:00
Alexey Kardashevskiy	0eaf4defc7	powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group So far one TCE table could only be used by one IOMMU group. However IODA2 hardware allows programming the same TCE table address to multiple PE allowing sharing tables. This replaces a single pointer to a group in a iommu_table struct with a linked list of groups which provides the way of invalidating TCE cache for every PE when an actual TCE table is updated. This adds pnv_pci_link_table_and_group() and pnv_pci_unlink_table_and_group() helpers to manage the list. However without VFIO, it is still going to be a single IOMMU group per iommu_table. This changes iommu_add_device() to add a device to a first group from the group list of a table as it is only called from the platform init code or PCI bus notifier and at these moments there is only one group per table. This does not change TCE invalidation code to loop through all attached groups in order to simplify this patch and because it is not really needed in most cases. IODA2 is fixed in a later patch. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:15 +10:00
Alexey Kardashevskiy	b348aa6529	powerpc/spapr: vfio: Replace iommu_table with iommu_table_group Modern IBM POWERPC systems support multiple (currently two) TCE tables per IOMMU group (a.k.a. PE). This adds a iommu_table_group container for TCE tables. Right now just one table is supported. This defines iommu_table_group struct which stores pointers to iommu_group and iommu_table(s). This replaces iommu_table with iommu_table_group where iommu_table was used to identify a group: - iommu_register_group(); - iommudata of generic iommu_group; This removes @data from iommu_table as it_table_group provides same access to pnv_ioda_pe. For IODA, instead of embedding iommu_table, the new iommu_table_group keeps pointers to those. The iommu_table structs are allocated dynamically. For P5IOC2, both iommu_table_group and iommu_table are embedded into PE struct. As there is no EEH and SRIOV support for P5IOC2, iommu_free_table() should not be called on iommu_table struct pointers so we can keep it embedded in pnv_phb::p5ioc2. For pSeries, this replaces multiple calls of kzalloc_node() with a new iommu_pseries_alloc_group() helper and stores the table group struct pointer into the pci_dn struct. For release, a iommu_table_free_group() helper is added. This moves iommu_table struct allocation from SR-IOV code to the generic DMA initialization code in pnv_pci_ioda_setup_dma_pe and pnv_pci_ioda2_setup_dma_pe as this is where DMA is actually initialized. This change is here because those lines had to be changed anyway. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:57 +10:00
Alexey Kardashevskiy	22af48596e	vfio: powerpc/spapr: Rework groups attaching This is to make extended ownership and multiple groups support patches simpler for review. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:56 +10:00
Alexey Kardashevskiy	649354b75d	vfio: powerpc/spapr: Moving pinning/unpinning to helpers This is a pretty mechanical patch to make next patches simpler. New tce_iommu_unuse_page() helper does put_page() now but it might skip that after the memory registering patch applied. As we are here, this removes unnecessary checks for a value returned by pfn_to_page() as it cannot possibly return NULL. This moves tce_iommu_disable() later to let tce_iommu_clear() know if the container has been enabled because if it has not been, then put_page() must not be called on TCEs from the TCE table. This situation is not yet possible but it will after KVM acceleration patchset is applied. This changes code to work with physical addresses rather than linear mapping addresses for better code readability. Following patches will add an xchg() callback for an IOMMU table which will accept/return physical addresses (unlike current tce_build()) which will eliminate redundant conversions. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:56 +10:00
Alexey Kardashevskiy	3c56e822f8	vfio: powerpc/spapr: Disable DMA mappings on disabled container At the moment DMA map/unmap requests are handled irrespective to the container's state. This allows the user space to pin memory which it might not be allowed to pin. This adds checks to MAP/UNMAP that the container is enabled, otherwise -EPERM is returned. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:56 +10:00
Alexey Kardashevskiy	2d270df8f7	vfio: powerpc/spapr: Move locked_vm accounting to helpers There moves locked pages accounting to helpers. Later they will be reused for Dynamic DMA windows (DDW). This reworks debug messages to show the current value and the limit. This stores the locked pages number in the container so when unlocking the iommu table pointer won't be needed. This does not have an effect now but it will with the multiple tables per container as then we will allow attaching/detaching groups on fly and we may end up having a container with no group attached but with the counter incremented. While we are here, update the comment explaining why RLIMIT_MEMLOCK might be required to be bigger than the guest RAM. This also prints pid of the current process in pr_warn/pr_debug. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:55 +10:00
Alexey Kardashevskiy	00663d4ee0	vfio: powerpc/spapr: Use it_page_size This makes use of the it_page_size from the iommu_table struct as page size can differ. This replaces missing IOMMU_PAGE_SHIFT macro in commented debug code as recently introduced IOMMU_PAGE_XXX macros do not include IOMMU_PAGE_SHIFT. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:55 +10:00
Alexey Kardashevskiy	e432bc7e15	vfio: powerpc/spapr: Check that IOMMU page is fully contained by system page This checks that the TCE table page size is not bigger that the size of a page we just pinned and going to put its physical address to the table. Otherwise the hardware gets unwanted access to physical memory between the end of the actual page and the end of the aligned up TCE page. Since compound_order() and compound_head() work correctly on non-huge pages, there is no need for additional check whether the page is huge. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:55 +10:00
Alexey Kardashevskiy	9b14a1ff86	vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU driver This moves page pinning (get_user_pages_fast()/put_page()) code out of the platform IOMMU code and puts it to VFIO IOMMU driver where it belongs to as the platform code does not deal with page pinning. This makes iommu_take_ownership()/iommu_release_ownership() deal with the IOMMU table bitmap only. This removes page unpinning from iommu_take_ownership() as the actual TCE table might contain garbage and doing put_page() on it is undefined behaviour. Besides the last part, the rest of the patch is mechanical. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:55 +10:00
Alex Williamson	20f300175a	vfio/pci: Fix racy vfio_device_get_from_dev() call Testing the driver for a PCI device is racy, it can be all but complete in the release path and still report the driver as ours. Therefore we can't trust drvdata to be valid. This race can sometimes be seen when one port of a multifunction device is being unbound from the vfio-pci driver while another function is being released by the user and attempting a bus reset. The device in the remove path is found as a dependent device for the bus reset of the release path device, the driver is still set to vfio-pci, but the drvdata has already been cleared, resulting in a null pointer dereference. To resolve this, fix vfio_device_get_from_dev() to not take the dev_get_drvdata() shortcut and instead traverse through the iommu_group, vfio_group, vfio_device path to get a reference we can trust. Once we have that reference, we know the device isn't in transition and we can test to make sure the driver is still what we expect, so that we don't interfere with devices we don't own. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-06-09 10:08:57 -06:00
Will Deacon	8a0a01bff8	drivers/vfio: Allow type-1 IOMMU instantiation on top of an ARM SMMUv3 The ARM SMMUv3 driver is compatible with the notion of a type-1 IOMMU in VFIO. This patch allows VFIO_IOMMU_TYPE1 to be selected if ARM_SMMU_V3=y. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2015-05-29 11:12:40 +02:00
Gavin Shan	68cbbc3a9d	drivers/vfio: Support EEH error injection The patch adds one more EEH sub-command (VFIO_EEH_PE_INJECT_ERR) to inject the specified EEH error, which is represented by (struct vfio_eeh_pe_err), to the indicated PE for testing purpose. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-05-12 20:33:35 +10:00
Alex Williamson	db7d4d7f40	vfio: Fix runaway interruptible timeout Commit `13060b64b8` ("vfio: Add and use device request op for vfio bus drivers") incorrectly makes use of an interruptible timeout. When interrupted, the signal remains pending resulting in subsequent timeouts occurring instantly. This makes the loop spin at a much higher rate than intended. Instead of making this completely non-interruptible, we can change this into a sort of interruptible-once behavior and use the "once" to log debug information. The driver API doesn't allow us to abort and return an error code. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Fixes: `13060b64b8` Cc: stable@vger.kernel.org # v4.0	2015-05-01 16:31:41 -06:00
Alex Williamson	5f55d2ae69	vfio-pci: Log device requests more verbosely Log some clues indicating whether the user is receiving device request interfaces or not listening. This can help indicate why a driver unbind is blocked or explain why QEMU automatically unplugged a device from the VM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-05-01 14:00:53 -06:00
Alex Williamson	5a0ff17741	vfio-pci: Fix use after free Reported by 0-day test infrastructure. Fixes: `ecaa1f6a01` ("vfio-pci: Add VGA arbiter client") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-04-08 08:11:51 -06:00
Alex Williamson	6eb7018705	vfio-pci: Move idle devices to D3hot power state We can save some power by putting devices that are bound to vfio-pci but not in use by the user in the D3hot power state. Devices get woken into D0 when opened by the user. Resets return the device to D0, so we need to re-apply the low power state after a bus reset. It's tempting to try to use D3cold, but we have no reason to inhibit hotplug of idle devices and we might get into a loop of having the device disappear before we have a chance to try to use it. A new module parameter allows this feature to be disabled if there are devices that misbehave as a result of this change. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-04-07 11:14:46 -06:00
Alex Williamson	561d72ddbb	vfio-pci: Remove warning if try-reset fails As indicated in the comment, this is not entirely uncommon and causes user concern for no reason. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-04-07 11:14:44 -06:00
Alex Williamson	80c7e8cc2a	vfio-pci: Allow PCI IDs to be specified as module options This copies the same support from pci-stub for exactly the same purpose, enabling a set of PCI IDs to be automatically added to the driver's dynamic ID table at module load time. The code here is pretty simple and both vfio-pci and pci-stub are fairly unique in being meta drivers, capable of attaching to any device, so there's no attempt made to generalize the code into pci-core. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-04-07 11:14:43 -06:00
Alex Williamson	ecaa1f6a01	vfio-pci: Add VGA arbiter client If VFIO VGA access is disabled for the user, either by CONFIG option or module parameter, we can often opt-out of VGA arbitration. We can do this when PCI bridge control of VGA routing is possible. This means that we must have a parent bridge and there must only be a single VGA device below that bridge. Fortunately this is the typical case for discrete GPUs. Doing this allows us to minimize the impact of additional GPUs, in terms of VGA arbitration, when they are only used via vfio-pci for non-VGA applications. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-04-07 11:14:41 -06:00
Alex Williamson	88c0dead9f	vfio-pci: Add module option to disable VGA region access Add a module option so that we don't require a CONFIG change and kernel rebuild to disable VGA support. Not only can VGA support be troublesome in itself, but by disabling it we can reduce the impact to host devices by doing a VGA arbitration opt-out. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-04-07 11:14:40 -06:00
Alex Williamson	71be3423a6	vfio: Split virqfd into a separate module for vfio bus drivers An unintended consequence of commit `42ac9bd18d` ("vfio: initialize the virqfd workqueue in VFIO generic code") is that the vfio module is renamed to vfio_core so that it can include both vfio and virqfd. That's a user visible change that may break module loading scritps and it imposes eventfd support as a dependency on the core vfio code, which it's really not. virqfd is intended to be provided as a service to vfio bus drivers, so instead of wrapping it into vfio.ko, we can make it a stand-alone module toggled by vfio bus drivers. This has the additional benefit of removing initialization and exit from the core vfio code. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-17 08:33:38 -06:00
kbuild test robot	66fdc052d7	vfio: virqfd_lock can be static Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-17 08:20:31 -06:00
Zhen Lei	2f51bf4be9	vfio: put off the allocation of "minor" in vfio_create_group The next code fragment "list_for_each_entry" is not depend on "minor". With this patch, the free of "minor" in "list_for_each_entry" can be reduced, and there is no functional change. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:56 -06:00
Antonios Motakis	a7fa7c77cf	vfio/platform: implement IRQ masking/unmasking via an eventfd With this patch the VFIO user will be able to set an eventfd that can be used in order to mask and unmask IRQs of platform devices. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:55 -06:00
Antonios Motakis	42ac9bd18d	vfio: initialize the virqfd workqueue in VFIO generic code Now we have finally completely decoupled virqfd from VFIO_PCI. We can initialize it from the VFIO generic code, in order to safely use it from multiple independent VFIO bus drivers. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:54 -06:00
Antonios Motakis	7e992d6927	vfio: move eventfd support code for VFIO_PCI to a separate file The virqfd functionality that is used by VFIO_PCI to implement interrupt masking and unmasking via an eventfd, is generic enough and can be reused by another driver. Move it to a separate file in order to allow the code to be shared. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:54 -06:00
Antonios Motakis	09bbcb8810	vfio: pass an opaque pointer on virqfd initialization VFIO_PCI passes the VFIO device structure *vdev via eventfd to the handler that implements masking/unmasking of IRQs via an eventfd. We can replace it in the virqfd infrastructure with an opaque type so we can make use of the mechanism from other VFIO bus drivers. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:53 -06:00
Antonios Motakis	9269c393e7	vfio: add local lock for virqfd instead of depending on VFIO PCI The Virqfd code needs to keep accesses to any struct *virqfd safe, but this comes into play only when creating or destroying eventfds, so sharing the same spinlock with the VFIO bus driver is not necessary. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:52 -06:00
Antonios Motakis	bb78e9eaab	vfio: virqfd: rename vfio_pci_virqfd_init and vfio_pci_virqfd_exit The functions vfio_pci_virqfd_init and vfio_pci_virqfd_exit are not really PCI specific, since we plan to reuse the virqfd code with more VFIO drivers in addition to VFIO_PCI. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> [Baptiste Reynal: Move rename vfio_pci_virqfd_init and vfio_pci_virqfd_exit from "vfio: add a vfio_ prefix to virqfd_enable and virqfd_disable and export"] Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:52 -06:00
Antonios Motakis	bdc5e1021b	vfio: add a vfio_ prefix to virqfd_enable and virqfd_disable and export We want to reuse virqfd functionality in multiple VFIO drivers; before moving these functions to core VFIO, add the vfio_ prefix to the virqfd_enable and virqfd_disable functions, and export them so they can be used from other modules. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:51 -06:00
Antonios Motakis	06211b40ce	vfio/platform: support for level sensitive interrupts Level sensitive interrupts are exposed as maskable and automasked interrupts and are masked and disabled automatically when they fire. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> [Baptiste Reynal: Move masked interrupt initialization from "vfio/platform: trigger an interrupt via eventfd"] Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:50 -06:00
Antonios Motakis	57f972e2b3	vfio/platform: trigger an interrupt via eventfd This patch allows to set an eventfd for a platform device's interrupt, and also to trigger the interrupt eventfd from userspace for testing. Level sensitive interrupts are marked as maskable and are handled in a later patch. Edge triggered interrupts are not advertised as maskable and are implemented here using a simple and efficient IRQ handler. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> [Baptiste Reynal: fix masked interrupt initialization] Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:50 -06:00
Antonios Motakis	9a36321c8d	vfio/platform: initial interrupts support code This patch is a skeleton for the VFIO_DEVICE_SET_IRQS IOCTL, around which most IRQ functionality is implemented in VFIO. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:49 -06:00
Antonios Motakis	682704c41e	vfio/platform: return IRQ info Return information for the interrupts exposed by the device. This patch extends VFIO_DEVICE_GET_INFO with the number of IRQs and enables VFIO_DEVICE_GET_IRQ_INFO. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:48 -06:00
Antonios Motakis	fad4d5b1f0	vfio/platform: support MMAP of MMIO regions Allow to memory map the MMIO regions of the device so userspace can directly access them. PIO regions are not being handled at this point. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:48 -06:00
Antonios Motakis	6e3f264560	vfio/platform: read and write support for the device fd VFIO returns a file descriptor which we can use to manipulate the memory regions of the device. Usually, the user will mmap memory regions that are addressable on page boundaries, however for memory regions where this is not the case we cannot provide mmap functionality due to security concerns. For this reason we also allow to use read and write functions to the file descriptor pointing to the memory regions. We implement this functionality only for MMIO regions of platform devices; PIO regions are not being handled at this point. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:47 -06:00
Antonios Motakis	e8909e67ca	vfio/platform: return info for device memory mapped IO regions This patch enables the IOCTLs VFIO_DEVICE_GET_REGION_INFO ioctl call, which allows the user to learn about the available MMIO resources of a device. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:46 -06:00
Antonios Motakis	2e8567bbb5	vfio/platform: return info for bound device A VFIO userspace driver will start by opening the VFIO device that corresponds to an IOMMU group, and will use the ioctl interface to get the basic device info, such as number of memory regions and interrupts, and their properties. This patch enables the VFIO_DEVICE_GET_INFO ioctl call. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> [Baptiste Reynal: added include in vfio_platform_common.c] Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:46 -06:00
Antonios Motakis	b13329adc2	vfio: amba: add the VFIO for AMBA devices module to Kconfig Enable building the VFIO AMBA driver. VFIO_AMBA depends on VFIO_PLATFORM, since it is sharing a portion of the code, and it is essentially implemented as a platform device whose resources are discovered via AMBA specific APIs in the kernel. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:45 -06:00
Antonios Motakis	36fe431f28	vfio: amba: VFIO support for AMBA devices Add support for discovering AMBA devices with VFIO and handle them similarly to Linux platform devices. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:44 -06:00
Antonios Motakis	5316153239	vfio: platform: add the VFIO PLATFORM module to Kconfig Enable building the VFIO PLATFORM driver that allows to use Linux platform devices with VFIO. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:44 -06:00
Antonios Motakis	9df85aaa43	vfio: platform: probe to devices on the platform bus Driver to bind to Linux platform devices, and callbacks to discover their resources to be used by the main VFIO PLATFORM code. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:43 -06:00
Antonios Motakis	de49fc0d99	vfio/platform: initial skeleton of VFIO support for platform devices This patch forms the common skeleton code for platform devices support with VFIO. This will include the core functionality of VFIO_PLATFORM, however binding to the device and discovering the device resources will be done with the help of a separate file where any Linux platform bus specific code will reside. This will allow us to implement support for also discovering AMBA devices and their resources, but still reuse a large part of the VFIO_PLATFORM implementation. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> [Baptiste Reynal: added includes in vfio_platform_private.h] Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-16 14:08:42 -06:00
Alexey Kardashevskiy	ec76f40070	vfio-pci: Add missing break to enable VFIO_PCI_ERR_IRQ_INDEX This adds a missing break statement to VFIO_DEVICE_SET_IRQS handler without which vfio_pci_set_err_trigger() would never be called. While we are here, add another "break" to VFIO_PCI_REQ_IRQ_INDEX case so if we add more indexes later, we won't miss it. Fixes: `6140a8f562` ("vfio-pci: Add device request interface") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-12 09:51:38 -06:00
Alex Williamson	6140a8f562	vfio-pci: Add device request interface Userspace can opt to receive a device request notification, indicating that the device should be released. This is setup the same way as the error IRQ and also supports eventfd signaling. Future support may forcefully remove the device from the user if the request is ignored. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-10 12:38:14 -07:00
Alex Williamson	cac80d6e38	vfio-pci: Generalize setup of simple eventfds We want another single vector IRQ index to support signaling of the device request to userspace. Generalize the error reporting IRQ index to avoid code duplication. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-10 12:37:57 -07:00
Alex Williamson	13060b64b8	vfio: Add and use device request op for vfio bus drivers When a request is made to unbind a device from a vfio bus driver, we need to wait for the device to become unused, ie. for userspace to release the device. However, we have a long standing TODO in the code to do something proactive to make that happen. To enable this, we add a request callback on the vfio bus driver struct, which is intended to signal the user through the vfio device interface to release the device. Instead of passively waiting for the device to become unused, we can now pester the user to give it up. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-10 12:37:47 -07:00
Alex Williamson	4a68810dbb	vfio: Tie IOMMU group reference to vfio group Move the iommu_group reference from the device to the vfio_group. This ensures that the iommu_group persists as long as the vfio_group remains. This can be important if all of the device from an iommu_group are removed, but we still have an outstanding vfio_group reference; we can still walk the empty list of devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-06 15:05:06 -07:00
Alex Williamson	60720a0fc6	vfio: Add device tracking during unbind There's a small window between the vfio bus driver calling vfio_del_group_dev() and the device being completely unbound where the vfio group appears to be non-viable. This creates a race for users like QEMU/KVM where the kvm-vfio module tries to get an external reference to the group in order to match and release an existing reference, while the device is potentially being removed from the vfio bus driver. If the group is momentarily non-viable, kvm-vfio may not be able to release the group reference until VM shutdown, making the group unusable until that point. Bridge the gap between device removal from the group and completion of the driver unbind by tracking it in a list. The device is added to the list before the bus driver reference is released and removed using the existing unbind notifier. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-06 15:05:06 -07:00
Alex Williamson	c5e6688752	vfio/type1: Add conditional rescheduling IOMMU operations can be expensive and it's not very difficult for a user to give us a lot of work to do for a map or unmap operation. Killing a large VM will vfio assigned devices can result in soft lockups and IOMMU tracing shows that we can easily spend 80% of our time with need-resched set. A sprinkling of conf_resched() calls after map and unmap calls has a very tiny affect on performance while resulting in traces with <1% of calls overflowing into needs- resched. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-06 14:19:12 -07:00
Alex Williamson	babbf17609	vfio/type1: Chunk contiguous reserved/invalid page mappings We currently map invalid and reserved pages, such as often occur from mapping MMIO regions of a VM through the IOMMU, using single pages. There's really no reason we can't instead follow the methodology we use for normal pages and find the largest possible physically contiguous chunk for mapping. The only difference is that we don't do locked memory accounting for these since they're not back by RAM. In most applications this will be a very minor improvement, but when graphics and GPGPU devices are in play, MMIO BARs become non-trivial. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-06 10:59:16 -07:00
Alex Williamson	6fe1010d6d	vfio/type1: DMA unmap chunking When unmapping DMA entries we try to rely on the IOMMU API behavior that allows the IOMMU to unmap a larger area than requested, up to the size of the original mapping. This works great when the IOMMU supports superpages and they're in use. Otherwise, each PAGE_SIZE increment is unmapped separately, resulting in poor performance. Instead we can use the IOVA-to-physical-address translation provided by the IOMMU API and unmap using the largest contiguous physical memory chunk available, which is also how vfio/type1 would have mapped the region. For a synthetic 1TB guest VM mapping and shutdown test on Intel VT-d (2M IOMMU pagesize support), this achieves about a 30% overall improvement mapping standard 4K pages, regardless of IOMMU superpage enabling, and about a 40% improvement mapping 2M hugetlbfs pages when IOMMU superpages are not available. Hugetlbfs with IOMMU superpages enabled is effectively unchanged. Unfortunately the same algorithm does not work well on IOMMUs with fine-grained superpages, like AMD-Vi, costing about 25% extra since the IOMMU will automatically unmap any power-of-two contiguous mapping we've provided it. We add a routine and a domain flag to detect this feature, leaving AMD-Vi unaffected by this unmap optimization. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-06 10:58:56 -07:00
Wei Yang	7c2e211f3c	vfio-pci: Fix the check on pci device type in vfio_pci_probe() Current vfio-pci just supports normal pci device, so vfio_pci_probe() will return if the pci device is not a normal device. While current code makes a mistake. PCI_HEADER_TYPE is the offset in configuration space of the device type, but we use this value to mask the type value. This patch fixs this by do the check directly on the pci_dev->hdr_type. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: stable@vger.kernel.org # v3.6+	2015-01-07 10:29:11 -07:00
Linus Torvalds	cc669743a3	VFIO updates for v3.19-rc1 - s390 support (Frank Blaschka) - Enable iommu-type1 for ARM SMMU (Will Deacon) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUka+sAAoJECObm247sIsiiy0P/iqrQpv94Z7rUKRlV3K8YtJj 8Oi5fLLnT2by9v5mS+KMElnQ5gLU/C5B/QGLMNrF2uQl8lguWSXJw37r7MkbIkpN RDLx1NhetLqbJ4CYLjyv/Jx3vl+Wr/2nNWRVIS5ajBmMjEgKVLvjYs4SaXELc3a8 a3YzcGW10BrVFlCJgUYqYIFGS1BmKjf7fbD5YBocj8tPv6NAlCiNNYYr+0pzW8Lf GTi39JlZ2t06hDq33eiUkrySWNjrIBn4g4PfAl7HBAscsZKS1w18MD1qVw4UXXa2 15+CBbsHU7ZLVo6G7vuZeJNCX9tdQ0WIZWQzHstQa914l86WYImTJ2tyH7Rn0ZcQ 3Mu9fzef9JgjkI56ol2zDwuOs+qttOYaLWjhhHiW4jkIxdnljnesIFlvmM3XeDGz 3Zowg09HzE3K+dt8265jVKkcNJbPLzspLvF27nPMudZNHBozcoPE0jKxU7QC3eMT Ij36+puQq+jccUic3Np6rxk5tzTHEat1a7w3IUwXCCUP5P5QW+kuuIjbd4hqQkHn VDRjnT6MWC3GguUCXR5VyO0zezpI20pTbWwE8u2qwnE349m0Eq/vxytj2lCLYLPR Jjtdduf1/Ppam7tATd3PwTu6KljY3dJiUUikyOc1J0KmkgkSMw+BtR6G7qytyW4q /fhClcsaNtxSheVh0N+b =1L2V -----END PGP SIGNATURE----- Merge tag 'vfio-v3.19-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - s390 support (Frank Blaschka) - Enable iommu-type1 for ARM SMMU (Will Deacon) * tag 'vfio-v3.19-rc1' of git://github.com/awilliam/linux-vfio: drivers/vfio: allow type-1 IOMMU instantiation on top of an ARM SMMU vfio: make vfio run on s390	2014-12-17 10:44:22 -08:00
Jiang Liu	83a18912b0	PCI/MSI: Rename write_msi_msg() to pci_write_msi_msg() Rename write_msi_msg() to pci_write_msi_msg() to mark it as PCI specific. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:45 +01:00
Will Deacon	5e9f36c59a	drivers/vfio: allow type-1 IOMMU instantiation on top of an ARM SMMU The ARM SMMU driver is compatible with the notion of a type-1 IOMMU in VFIO. This patch allows VFIO_IOMMU_TYPE1 to be selected if ARM_SMMU=y. Signed-off-by: Will Deacon <will.deacon@arm.com> [aw: update for existing S390 patch] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-11-14 09:10:59 -07:00
Frank Blaschka	1d53a3a7d3	vfio: make vfio run on s390 add Kconfig switch to hide INTx add Kconfig switch to let vfio announce PCI BARs are not mapable Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-11-07 09:52:22 -07:00
Linus Torvalds	23971bdfff	IOMMU Updates for Linux v3.18 This pull-request includes: * Change in the IOMMU-API to convert the former iommu_domain_capable function to just iommu_capable * Various fixes in handling RMRR ranges for the VT-d driver (one fix requires a device driver core change which was acked by Greg KH) * The AMD IOMMU driver now assigns and deassigns complete alias groups to fix issues with devices using the wrong PCI request-id * MMU-401 support for the ARM SMMU driver * Multi-master IOMMU group support for the ARM SMMU driver * Various other small fixes all over the place -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJUPNxYAAoJECvwRC2XARrjMwMP/RLSr+oA31rGVjLXcmcCHl7Q Uj7xpcnG19qB0aqNR1JeJuZNkK/tw44pE353MQPbz4N9UVUiogklGIVD1iJvFV53 0qm84bvpDJIof4aP35B3H3Umft2USTn/lmsQg/RklQcNTW8DzNj63b8BTNR7k/GL G7bLg7F1BUCl0shZCCsFspOIulQPAJYN2OvHlfYBav/bfDvfouQ3lrV+loGrK44r F2Hmp+imXlIhUCjfbiWz6wKFxvPrxZx482vm2pXBCSnXEdW4/fz6nf9VHUK/Cfsq JAimY1CfiDo1aqH9/yVHUOw5SD/NYOXq6E5bFPg/WENbipbbae5cK2u6PX5MMBAn CG4BM8l9xicfGPqgn5YFSRY/6qC6K7NlxMnt9U8l18QIkDVDqEtUgJQISJuce7wx FWx6eSWaxpIe5yhq19/h2ELalUUyR/fPq+UXXjYDL1kLV/vcvC/lC3mbNAQU93zU WK0bG2tDg88JHavc25Ewa2aOn4BVM2BpwuLbYlgQReaEmsQRnEPgtmRNyLJHqbFE wwpCj8pBWdufsJWRyvpnXQ+CfA7oSz4e7hz1G+0/5uiDmagfvg16Ql5JtPmmuLUm Kc3dVIiG0s1ewohZIIJETGCqprQbCSqs8CCQqB6p2zDBWFKpNT7F38lm/KlehkCz JpAiI7Y2K9Jejp0VIPrt =OMOt -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "This pull-request includes: - change in the IOMMU-API to convert the former iommu_domain_capable function to just iommu_capable - various fixes in handling RMRR ranges for the VT-d driver (one fix requires a device driver core change which was acked by Greg KH) - the AMD IOMMU driver now assigns and deassigns complete alias groups to fix issues with devices using the wrong PCI request-id - MMU-401 support for the ARM SMMU driver - multi-master IOMMU group support for the ARM SMMU driver - various other small fixes all over the place" * tag 'iommu-updates-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits) iommu/vt-d: Work around broken RMRR firmware entries iommu/vt-d: Store bus information in RMRR PCI device path iommu/vt-d: Only remove domain when device is removed driver core: Add BUS_NOTIFY_REMOVED_DEVICE event iommu/amd: Fix devid mapping for ivrs_ioapic override iommu/irq_remapping: Fix the regression of hpet irq remapping iommu: Fix bus notifier breakage iommu/amd: Split init_iommu_group() from iommu_init_device() iommu: Rework iommu_group_get_for_pci_dev() iommu: Make of_device_id array const amd_iommu: do not dereference a NULL pointer address. iommu/omap: Remove omap_iommu unused owner field iommu: Remove iommu_domain_has_cap() API function IB/usnic: Convert to use new iommu_capable() API function vfio: Convert to use new iommu_capable() API function kvm: iommu: Convert to use new iommu_capable() API function iommu/tegra: Convert to iommu_capable() API function iommu/msm: Convert to iommu_capable() API function iommu/vt-d: Convert to iommu_capable() API function iommu/fsl: Convert to iommu_capable() API function ...	2014-10-15 07:23:49 +02:00
Linus Torvalds	27a9716bc8	VFIO updates for v3.18-rc1 - Nested IOMMU extension to type1 (Will Deacon) - Restore MSIx message before enabling (Gavin Shan) - Fix remove path locking (Alex Williamson) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUOOETAAoJECObm247sIsihDQP/jADEe9KFu4ymWu7rqi24w1L 81hGNLXlfx2PPomluN3jENpyueo7vWdP5yZ8q/bi6oF6UbShL8Po01UKHOJzJJwW 8GW86YcNsmPz/jl8Jcdbkex3dKvT1OzrDjFjCiKTJBHxE9nEdtWlRV8mO1pwd00t YFiXF8xFbkpHExMiQNU36rq/fzZCTOu4ZpCK9kDT7Sy+lsKAnGoXuM1IZK+7DGJo jcsMF32DVDmji6riy3uHHPc0qprP24QNVy6FfOmLEUvuOEIUOxMAYM9je9mmsHeS CeR/NHexr4RgYQE33jL1w8A1saT0rbu7DSKSa7OQebnY2Zte+oncLtqFZR2/Wylh jBU5r7P3PdxM6ykqEeC/3ytx7iFX6c7jc0SU4I5m8bFexmUQXqOko28gGIt0OL3n R8CmNF/MDs3gqYprhW6MvSJI1diY1+pX7pX0e7k7lDAoZ1QOjPNSGv+YOfF3H1YB AggIVxIKXW0T0bQ/hKcQiDKkxQ88vi1hld2LknbiBW9nMNLjNkxl2RZSGunFvWWN LzOYkBgR6rrTbhTvsWApsfYguYtGkgAGGJZSR1oev0BJnx4UHOfL1bykJRyUHdUd KDSBEni5TY65087IKD93nkyRhassszOa9XHmRDwQLxQeJCKRZi6bQRSzFZVheXIO O3XINOo2wNF1bIrfD/vR =s2+/ -----END PGP SIGNATURE----- Merge tag 'vfio-v3.18-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Nested IOMMU extension to type1 (Will Deacon) - Restore MSIx message before enabling (Gavin Shan) - Fix remove path locking (Alex Williamson) * tag 'vfio-v3.18-rc1' of git://github.com/awilliam/linux-vfio: vfio-pci: Fix remove path locking drivers/vfio: Export vfio_spapr_iommu_eeh_ioctl() with GPL vfio/pci: Restore MSIx message prior to enabling PCI: Export MSI message relevant functions vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type iommu: introduce domain attribute for nesting IOMMUs	2014-10-11 06:49:24 -04:00
Alex Williamson	93899a679f	vfio-pci: Fix remove path locking Locking both the remove() and release() path results in a deadlock that should have been obvious. To fix this we can get and hold the vfio_device reference as we evaluate whether to do a bus/slot reset. This will automatically block any remove() calls, allowing us to remove the explict lock. Fixes `61d792562b`. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: stable@vger.kernel.org [3.17]	2014-09-29 17:18:39 -06:00
Gavin Shan	0f905ce2b5	drivers/vfio: Export vfio_spapr_iommu_eeh_ioctl() with GPL The function should have been exported with EXPORT_SYMBOL_GPL() as part of commit `92d18a6851` ("drivers/vfio: Fix EEH build error"). Suggested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-09-29 10:31:51 -06:00
Gavin Shan	b8f02af096	vfio/pci: Restore MSIx message prior to enabling The MSIx vector table lives in device memory, which may be cleared as part of a backdoor device reset. This is the case on the IBM IPR HBA when the BIST is run on the device. When assigned to a QEMU guest, the guest driver does a pci_save_state(), issues a BIST, then does a pci_restore_state(). The BIST clears the MSIx vector table, but due to the way interrupts are configured the pci_restore_state() does not restore the vector table as expected. Eventually this results in an EEH error on Power platforms when the device attempts to signal an interrupt with the zero'd table entry. Fix the problem by restoring the host cached MSI message prior to enabling each vector. Reported-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-09-29 10:16:24 -06:00
Will Deacon	f5c9ecebaf	vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type VFIO allows devices to be safely handed off to userspace by putting them behind an IOMMU configured to ensure DMA and interrupt isolation. This enables userspace KVM clients, such as kvmtool and qemu, to further map the device into a virtual machine. With IOMMUs such as the ARM SMMU, it is then possible to provide SMMU translation services to the guest operating system, which are nested with the existing translation installed by VFIO. However, enabling this feature means that the IOMMU driver must be informed that the VFIO domain is being created for the purposes of nested translation. This patch adds a new IOMMU type (VFIO_TYPE1_NESTING_IOMMU) to the VFIO type-1 driver. The new IOMMU type acts identically to the VFIO_TYPE1v2_IOMMU type, but additionally sets the DOMAIN_ATTR_NESTING attribute on its IOMMU domains. Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-09-29 10:06:19 -06:00
Chen, Gong	846fc70986	PCI/AER: Rename PCI_ERR_UNC_TRAIN to PCI_ERR_UNC_UND In PCIe r1.0, sec 5.10.2, bit 0 of the Uncorrectable Error Status, Mask, and Severity Registers was for "Training Error." In PCIe r1.1, sec 7.10.2, bit 0 was redefined to be "Undefined." Rename PCI_ERR_UNC_TRAIN to PCI_ERR_UNC_UND to reflect this change. No functional change. [bhelgaas: changelog] Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2014-09-25 09:42:40 -06:00
Joerg Roedel	eb165f0584	vfio: Convert to use new iommu_capable() API function Cc: Alex Williamson <alex.williamson@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2014-09-25 15:47:45 +02:00
Alexey Kardashevskiy	9b936c960f	drivers/vfio: Enable VFIO if EEH is not supported The existing vfio_pci_open() fails upon error returned from vfio_spapr_pci_eeh_open(), which breaks POWER7's P5IOC2 PHB support which this patch brings back. The patch fixes the issue by dropping the return value of vfio_spapr_pci_eeh_open(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-08-08 10:39:16 -06:00
Alexey Kardashevskiy	89a2edd62f	drivers/vfio: Allow EEH to be built as module This adds necessary declarations to the SPAPR VFIO EEH module, otherwise multiple dynamic linker errors reported: vfio_spapr_eeh: Unknown symbol eeh_pe_set_option (err 0) vfio_spapr_eeh: Unknown symbol eeh_pe_configure (err 0) vfio_spapr_eeh: Unknown symbol eeh_pe_reset (err 0) vfio_spapr_eeh: Unknown symbol eeh_pe_get_state (err 0) vfio_spapr_eeh: Unknown symbol eeh_iommu_group_to_pe (err 0) vfio_spapr_eeh: Unknown symbol eeh_dev_open (err 0) vfio_spapr_eeh: Unknown symbol eeh_pe_set_option (err 0) vfio_spapr_eeh: Unknown symbol eeh_pe_configure (err 0) vfio_spapr_eeh: Unknown symbol eeh_pe_reset (err 0) vfio_spapr_eeh: Unknown symbol eeh_pe_get_state (err 0) vfio_spapr_eeh: Unknown symbol eeh_iommu_group_to_pe (err 0) vfio_spapr_eeh: Unknown symbol eeh_dev_open (err 0) Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-08-08 10:37:51 -06:00
Gavin Shan	92d18a6851	drivers/vfio: Fix EEH build error The VFIO related components could be built as dynamic modules. Unfortunately, CONFIG_EEH can't be configured to "m". The patch fixes the build errors when configuring VFIO related components as dynamic modules as follows: CC [M] drivers/vfio/vfio_iommu_spapr_tce.o In file included from drivers/vfio/vfio.c:33:0: include/linux/vfio.h:101:43: warning: ‘struct pci_dev’ declared \ inside parameter list [enabled by default] : WRAP arch/powerpc/boot/zImage.pseries WRAP arch/powerpc/boot/zImage.maple WRAP arch/powerpc/boot/zImage.pmac WRAP arch/powerpc/boot/zImage.epapr MODPOST 1818 modules ERROR: ".vfio_spapr_iommu_eeh_ioctl" [drivers/vfio/vfio_iommu_spapr_tce.ko]\ undefined! ERROR: ".vfio_spapr_pci_eeh_open" [drivers/vfio/pci/vfio-pci.ko] undefined! ERROR: ".vfio_spapr_pci_eeh_release" [drivers/vfio/pci/vfio-pci.ko] undefined! Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-08-08 10:36:20 -06:00
Alex Williamson	bc4fba7712	vfio-pci: Attempt bus/slot reset on release Each time a device is released, mark whether a local reset was successful or whether a bus/slot reset is needed. If a reset is needed and all of the affected devices are bound to vfio-pci and unused, allow the reset. This is most useful when the userspace driver is killed and releases all the devices in an unclean state, such as when a QEMU VM quits. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-08-07 11:12:07 -06:00
Alex Williamson	61d792562b	vfio-pci: Use mutex around open, release, and remove Serializing open/release allows us to fix a refcnt error if we fail to enable the device and lets us prevent devices from being unbound or opened, giving us an opportunity to do bus resets on release. No restriction added to serialize binding devices to vfio-pci while the mutex is held though. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-08-07 11:12:04 -06:00
Alex Williamson	9c22e660ce	vfio-pci: Release devices with BusMaster disabled Our current open/release path looks like this: vfio_pci_open vfio_pci_enable pci_enable_device pci_save_state pci_store_saved_state vfio_pci_release vfio_pci_disable pci_disable_device pci_restore_state pci_enable_device() doesn't modify PCI_COMMAND_MASTER, so if a device comes to us with it enabled, it persists through the open and gets stored as part of the device saved state. We then restore that saved state when released, which can allow the device to attempt to continue to do DMA. When the group is disconnected from the domain, this will get caught by the IOMMU, but if there are other devices in the group, the device may continue running and interfere with the user. Even in the former case, IOMMUs don't necessarily behave well and a stream of blocked DMA can result in unpleasant behavior on the host. Explicitly disable Bus Master as we're enabling the device and slightly re-work release to make sure that pci_disable_device() is the last thing that touches the device. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-08-07 11:12:02 -06:00
Gavin Shan	1b69be5e8a	drivers/vfio: EEH support for VFIO PCI device The patch adds new IOCTL commands for sPAPR VFIO container device to support EEH functionality for PCI devices, which have been passed through from host to somebody else via VFIO. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Alexander Graf <agraf@suse.de> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-05 15:28:48 +10:00
Linus Torvalds	a68a7509d3	A handful of VFIO bug fixes for v3.16 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTkiFBAAoJECObm247sIsiymsQALDs03j1lm1ZUx3peI8JuEGp +4WrflS4eKBX2r+XOSqHQNlg5ceKpj1zsEMKXkLFBCwTEZXRz9tK7bu6hsRuE8XR j67JkBm4alsJQrJIQl0+dYJMFPVGWBNyoBc4kV2YbernBnFPc/5oOmP9wAJD80OI HHNQxOdfce+YZnkgltc/GkOWdG/YvuQbkyt2YmGtTePapzboj6Tf0yvkqRDaL0Bf 07+vYi6ZOSzx/Niw/ak09hll4OhYfgiOQ/t6LETmLi7PEGR88TYO/kqdzGmlnx6V XsDZ+WORWjtORGnyJI4p7iVFiP7mz4mmwUptfJcA0AX98csEdYHt6H4mlVkN7Gmd rSvarKQ7On8MRbymQIVw5w3V78X/HiWUsUi3Fef5xytQyuD4VxylS0HDtPM/AyeN 3zEJKOI+CIkFLZ6fReL8LKZuaRdQTbJvjxMVvmTr2XrBxZTlk91CsqT3JFXjJ6CV 2QEv2eVmsCkPtScWNZzaeQsgH75Hc8vznZxA+/BFDoeR2iq31BPE9iuYDpaVZS58 mnAEll4tPMIqcqQvxwoB8B8kyzX636trwW+585jr3gKRS8V6qHaWkXz06N5kzz7s dhRUy3rKe1SeutVo0Vs067Ym60yUlvuTeWTppuOnfRpgIpbCLeMwuiA22Wq3hBIF loXH9eAZOuwzBg1hqVWJ =aR4I -----END PGP SIGNATURE----- Merge tag 'vfio-v3.16-rc1' of git://github.com/awilliam/linux-vfio into next Pull VFIO updates from Alex Williamson: "A handful of VFIO bug fixes for v3.16" * tag 'vfio-v3.16-rc1' of git://github.com/awilliam/linux-vfio: drivers/vfio/pci: Fix wrong MSI interrupt count drivers/vfio: Rework offsetofend() vfio/iommu_type1: Avoid overflow vfio/pci: Fix unchecked return value vfio/pci: Fix sizing of DPA and THP express capabilities	2014-06-07 20:12:15 -07:00
Gavin Shan	fd49c81f08	drivers/vfio/pci: Fix wrong MSI interrupt count According PCI local bus specification, the register of Message Control for MSI (offset: 2, length: 2) has bit#0 to enable or disable MSI logic and it shouldn't be part contributing to the calculation of MSI interrupt count. The patch fixes the issue. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-05-30 11:35:54 -06:00
Alex Williamson	c8dbca165b	vfio/iommu_type1: Avoid overflow Coverity reports use of a tained scalar used as a loop boundary. For the most part, any values passed from userspace for a DMA mapping size, IOVA, or virtual address are valid, with some alignment constraints. The size is ultimately bound by how many pages the user is able to lock, IOVA is tested by the IOMMU driver when doing a map, and the virtual address needs to pass get_user_pages. The only problem I can find is that we do expect the __u64 user values to fit within our variables, which might not happen on 32bit platforms. Add a test for this and return error on overflow. Also propagate use of the type-correct local variables throughout the function. The above also points to the 'end' variable, which can be zero if we're operating at the very top of the address space. We try to account for this, but our loop botches it. Rework the loop to use the remaining size as our loop condition rather than the IOVA vs end. Detected by Coverity: CID 714659 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-05-30 11:35:54 -06:00
Alex Williamson	eb5685f0de	vfio/pci: Fix unchecked return value There's nothing we can do different if pci_load_and_free_saved_state() fails, other than maybe print some log message, but the actual re-load of the state is an unnecessary step here since we've only just saved it. We can cleanup a coverity warning and eliminate the unnecessary step by freeing the state ourselves. Detected by Coverity: CID 753101 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-05-30 11:35:53 -06:00
Alex Williamson	afa63252b2	vfio/pci: Fix sizing of DPA and THP express capabilities When sizing the TPH capability we store the register containing the table size into the 'dword' variable, but then use the uninitialized 'byte' variable to analyze the size. The table size is also actually reported as an N-1 value, so correct sizing to account for this. The round_up() for both TPH and DPA is unnecessary, remove it. Detected by Coverity: CID 714665 & 715156 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-05-30 10:50:31 -06:00
Jean Delvare	8283b4919e	driver core: dev_set_drvdata can no longer fail So there is no point in checking its return value, which will soon disappear. Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-05-27 13:40:51 -07:00
Linus Torvalds	d0cb5f71c5	VFIO updates for v3.15 include: - Allow the vfio-type1 IOMMU to support multiple domains within a container - Plumb path to query whether all domains are cache-coherent - Wire query into kvm-vfio device to avoid KVM x86 WBINVD emulation - Always select CONFIG_ANON_INODES, vfio depends on it (Arnd) The first patch also makes the vfio-type1 IOMMU driver completely independent of the bus_type of the devices it's handling, which enables it to be used for both vfio-pci and a future vfio-platform (and hopefully combinations involving both simultaneously). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTPbikAAoJECObm247sIsiG9cP/jnurW84YuAHmybzy3R4nMaa 8BWcQST+TyQ78GWvubFDcRt+vHmJUI4iFWPBIm1twBSVrMy6F1GcT/spmSne1Dfb OvcfGEy59tGO+BeklHg5Qq2Hj1UzfeV9uQoy+PrpTF/sYzsyL6g5+O3Zv39SNvr0 zCwnO2JAKXIpQlkK3wVha6V13X07Z/+d2b4JSsvmON2cnlPzOUYNrgBvoSeV2IQe 3kMAU9qfAfQlpNDhRRZL/HshgWazCg4XZUp9UdNjUkOxrUp4vXHrVTUJnUGBDytk 8V9UGRKF3mik9PqpZJk4jLV5urgUVpnUR5747uqs0KF+9GWxClXvh3gp1XX8Zn7f NCfDSMn/wrDr6fQBglKUadITaDhF49KV+J0cX083q46BbYxhAfgxv1I9UOvLreG6 SGAezlTB0mefa1BmSwJdfKwDBsuDOhbF0TsHC6zT7yFc8pqe3hl/vQzUyu3HtwuM yPycQiQQmvOaymaiiBRwyLocwJbDNKPigDomv8NZ8Mcd97Fu9YsF4ydaxMJmmeKf TffYS4z4tUElYiU2vv1ipB8T+ReCmdtj2cIvyfwTL9jycY9ouO4bJ56dOGWd3WNl m7AVokbrrJQ03itUpFMuYjHAzTNUlXVvXlGr9n6L4VTR0RTL1zwJVrMVDsXdhxm4 XgDbVB5PrxinDAC1vJvI =mnzO -----END PGP SIGNATURE----- Merge tag 'vfio-v3.15-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: "VFIO updates for v3.15 include: - Allow the vfio-type1 IOMMU to support multiple domains within a container - Plumb path to query whether all domains are cache-coherent - Wire query into kvm-vfio device to avoid KVM x86 WBINVD emulation - Always select CONFIG_ANON_INODES, vfio depends on it (Arnd) The first patch also makes the vfio-type1 IOMMU driver completely independent of the bus_type of the devices it's handling, which enables it to be used for both vfio-pci and a future vfio-platform (and hopefully combinations involving both simultaneously)" * tag 'vfio-v3.15-rc1' of git://github.com/awilliam/linux-vfio: vfio: always select ANON_INODES kvm/vfio: Support for DMA coherent IOMMUs vfio: Add external user check extension interface vfio/type1: Add extension to test DMA cache coherence of IOMMU vfio/iommu_type1: Multi-IOMMU domain support	2014-04-03 14:05:02 -07:00
Linus Torvalds	4b1779c2cf	PCI changes for the v3.15 merge window: Enumeration - Increment max correctly in pci_scan_bridge() (Andreas Noever) - Clarify the "scan anyway" comment in pci_scan_bridge() (Andreas Noever) - Assign CardBus bus number only during the second pass (Andreas Noever) - Use request_resource_conflict() instead of insert_ for bus numbers (Andreas Noever) - Make sure bus number resources stay within their parents bounds (Andreas Noever) - Remove pci_fixup_parent_subordinate_busnr() (Andreas Noever) - Check for child busses which use more bus numbers than allocated (Andreas Noever) - Don't scan random busses in pci_scan_bridge() (Andreas Noever) - x86: Drop pcibios_scan_root() check for bus already scanned (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata() (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_on_node() (Bjorn Helgaas) - x86: Merge pci_scan_bus_on_node() into pcibios_scan_root() (Bjorn Helgaas) - x86: Drop return value of pcibios_scan_root() (Bjorn Helgaas) NUMA - x86: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus (Bjorn Helgaas) - x86: Use x86_pci_root_bus_node() instead of get_mp_bus_to_node() (Bjorn Helgaas) - x86: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node() (Bjorn Helgaas) - x86: Use NUMA_NO_NODE, not -1, for unknown node (Bjorn Helgaas) - x86: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ia64: Use NUMA_NO_NODE, not MAX_NUMNODES, for unknown node (Bjorn Helgaas) - ia64: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ACPI: Fix acpi_get_node() prototype (Bjorn Helgaas) Resource management - i2o: Fix and refactor PCI space allocation (Bjorn Helgaas) - Add resource_contains() (Bjorn Helgaas) - Add %pR support for IORESOURCE_UNSET (Bjorn Helgaas) - Mark resources as IORESOURCE_UNSET if we can't assign them (Bjorn Helgaas) - Don't clear IORESOURCE_UNSET when updating BAR (Bjorn Helgaas) - Check IORESOURCE_UNSET before updating BAR (Bjorn Helgaas) - Don't try to claim IORESOURCE_UNSET resources (Bjorn Helgaas) - Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit (Bjorn Helgaas) - Don't enable decoding if BAR hasn't been assigned an address (Bjorn Helgaas) - Add "weak" generic pcibios_enable_device() implementation (Bjorn Helgaas) - alpha, microblaze, sh, sparc, tile: Use default pcibios_enable_device() (Bjorn Helgaas) - s390: Use generic pci_enable_resources() (Bjorn Helgaas) - Don't check resource_size() in pci_bus_alloc_resource() (Bjorn Helgaas) - Set type in __request_region() (Bjorn Helgaas) - Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() (Bjorn Helgaas) - Change pci_bus_alloc_resource() type_mask to unsigned long (Bjorn Helgaas) - Log IDE resource quirk in dmesg (Bjorn Helgaas) - Revert "[PATCH] Insert GART region into resource map" (Bjorn Helgaas) PCI device hotplug - Make check_link_active() non-static (Rajat Jain) - Use link change notifications for hot-plug and removal (Rajat Jain) - Enable link state change notifications (Rajat Jain) - Don't disable the link permanently during removal (Rajat Jain) - Don't check adapter or latch status while disabling (Rajat Jain) - Disable link notification across slot reset (Rajat Jain) - Ensure very fast hotplug events are also processed (Rajat Jain) - Add hotplug_lock to serialize hotplug events (Rajat Jain) - Remove a non-existent card, regardless of "surprise" capability (Rajat Jain) - Don't turn slot off when hot-added device already exists (Yijing Wang) MSI - Keep pci_enable_msi() documentation (Alexander Gordeev) - ahci: Fix broken single MSI fallback (Alexander Gordeev) - ahci, vfio: Use pci_enable_msi_range() (Alexander Gordeev) - Check kmalloc() return value, fix leak of name (Greg Kroah-Hartman) - Fix leak of msi_attrs (Greg Kroah-Hartman) - Fix pci_msix_vec_count() htmldocs failure (Masanari Iida) Virtualization - Device-specific ACS support (Alex Williamson) Freescale i.MX6 - Wait for retraining (Marek Vasut) Marvell MVEBU - Use Device ID and revision from underlying endpoint (Andrew Lunn) - Fix incorrect size for PCI aperture resources (Jason Gunthorpe) - Call request_resource() on the apertures (Jason Gunthorpe) - Fix potential issue in range parsing (Jean-Jacques Hiblot) Renesas R-Car - Check platform_get_irq() return code (Ben Dooks) - Add error interrupt handling (Ben Dooks) - Fix bridge logic configuration accesses (Ben Dooks) - Register each instance independently (Magnus Damm) - Break out window size handling (Magnus Damm) - Make the Kconfig dependencies more generic (Magnus Damm) Synopsys DesignWare - Fix RC BAR to be single 64-bit non-prefetchable memory (Mohit Kumar) Miscellaneous - Remove unused SR-IOV VF Migration support (Bjorn Helgaas) - Enable INTx if BIOS left them disabled (Bjorn Helgaas) - Fix hex vs decimal typo in cpqhpc_probe() (Dan Carpenter) - Clean up par-arch object file list (Liviu Dudau) - Set IORESOURCE_ROM_SHADOW only for the default VGA device (Sander Eikelenboom) - ACPI, ARM, drm, powerpc, pcmcia, PCI: Use list_for_each_entry() for bus traversal (Yijing Wang) - Fix pci_bus_b() build failure (Paul Gortmaker) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJTOdAZAAoJEFmIoMA60/r8VYUQALRrReyMBk3pjRt/fKIX4Kwi ydSo/YJeeKTN8K93fLw8bb8bdPItJScJFTfEa4Q2SpZezR/ecGXLowisy0BBaPHK qtOyB8EqjkLS17GfyecIe9Nd2SIAI2De/0bchK3kDtIX1YlZB/k/tD3eCPMHDnnl m8c5kAHKPQYd8g01I+S8nrtGHk/A33grfYpJXPZbcqyhE0lWU3SI8KDAGbcKzNHE 23Do0yNyd4nHIdixWlhETcNvzHn35Q/O38JJwW9Mf1aI9gusYuml6GFefCgu/iov lxqp3CEW7iPZgQEgNbrQ0HzWn/durL2Trd6S/Yh6f2xbm1LGYKWh3LZUFLd3AQDd INEpUgKsyb//nF3dtiyGnZlp0QykoqFyLo2AEDrb+ILTd4up5DeRY/m1UpjAXR5p QicBmrDksHrSivPmMZwLx1DFQYKjQbdx5lOqy9hQM/Jmsr+N3/l7QBrbQWXks3JZ NNAyn4RZHQB7UDQS/MmVPArs+JK5qaEDQD57QuOTlqgP19VY9C9E/l/aEqefjdFo XOAm7CwGpB/iBAkIbE6ROEDiJArigRVHEfxLYeE/jtGOdRDCD1deWk+g3S8DWD7m ZxWSgIVB00PMAmomczdg59YVFBhocgwPUa8/cw6yqzx2QKP4mWXIFZ/Sjau5I3tn WWoxXlUirZfTJc29XnVy =3mNS -----END PGP SIGNATURE----- Merge tag 'pci-v3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Enumeration - Increment max correctly in pci_scan_bridge() (Andreas Noever) - Clarify the "scan anyway" comment in pci_scan_bridge() (Andreas Noever) - Assign CardBus bus number only during the second pass (Andreas Noever) - Use request_resource_conflict() instead of insert_ for bus numbers (Andreas Noever) - Make sure bus number resources stay within their parents bounds (Andreas Noever) - Remove pci_fixup_parent_subordinate_busnr() (Andreas Noever) - Check for child busses which use more bus numbers than allocated (Andreas Noever) - Don't scan random busses in pci_scan_bridge() (Andreas Noever) - x86: Drop pcibios_scan_root() check for bus already scanned (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata() (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_on_node() (Bjorn Helgaas) - x86: Merge pci_scan_bus_on_node() into pcibios_scan_root() (Bjorn Helgaas) - x86: Drop return value of pcibios_scan_root() (Bjorn Helgaas) NUMA - x86: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus (Bjorn Helgaas) - x86: Use x86_pci_root_bus_node() instead of get_mp_bus_to_node() (Bjorn Helgaas) - x86: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node() (Bjorn Helgaas) - x86: Use NUMA_NO_NODE, not -1, for unknown node (Bjorn Helgaas) - x86: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ia64: Use NUMA_NO_NODE, not MAX_NUMNODES, for unknown node (Bjorn Helgaas) - ia64: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ACPI: Fix acpi_get_node() prototype (Bjorn Helgaas) Resource management - i2o: Fix and refactor PCI space allocation (Bjorn Helgaas) - Add resource_contains() (Bjorn Helgaas) - Add %pR support for IORESOURCE_UNSET (Bjorn Helgaas) - Mark resources as IORESOURCE_UNSET if we can't assign them (Bjorn Helgaas) - Don't clear IORESOURCE_UNSET when updating BAR (Bjorn Helgaas) - Check IORESOURCE_UNSET before updating BAR (Bjorn Helgaas) - Don't try to claim IORESOURCE_UNSET resources (Bjorn Helgaas) - Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit (Bjorn Helgaas) - Don't enable decoding if BAR hasn't been assigned an address (Bjorn Helgaas) - Add "weak" generic pcibios_enable_device() implementation (Bjorn Helgaas) - alpha, microblaze, sh, sparc, tile: Use default pcibios_enable_device() (Bjorn Helgaas) - s390: Use generic pci_enable_resources() (Bjorn Helgaas) - Don't check resource_size() in pci_bus_alloc_resource() (Bjorn Helgaas) - Set type in __request_region() (Bjorn Helgaas) - Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() (Bjorn Helgaas) - Change pci_bus_alloc_resource() type_mask to unsigned long (Bjorn Helgaas) - Log IDE resource quirk in dmesg (Bjorn Helgaas) - Revert "[PATCH] Insert GART region into resource map" (Bjorn Helgaas) PCI device hotplug - Make check_link_active() non-static (Rajat Jain) - Use link change notifications for hot-plug and removal (Rajat Jain) - Enable link state change notifications (Rajat Jain) - Don't disable the link permanently during removal (Rajat Jain) - Don't check adapter or latch status while disabling (Rajat Jain) - Disable link notification across slot reset (Rajat Jain) - Ensure very fast hotplug events are also processed (Rajat Jain) - Add hotplug_lock to serialize hotplug events (Rajat Jain) - Remove a non-existent card, regardless of "surprise" capability (Rajat Jain) - Don't turn slot off when hot-added device already exists (Yijing Wang) MSI - Keep pci_enable_msi() documentation (Alexander Gordeev) - ahci: Fix broken single MSI fallback (Alexander Gordeev) - ahci, vfio: Use pci_enable_msi_range() (Alexander Gordeev) - Check kmalloc() return value, fix leak of name (Greg Kroah-Hartman) - Fix leak of msi_attrs (Greg Kroah-Hartman) - Fix pci_msix_vec_count() htmldocs failure (Masanari Iida) Virtualization - Device-specific ACS support (Alex Williamson) Freescale i.MX6 - Wait for retraining (Marek Vasut) Marvell MVEBU - Use Device ID and revision from underlying endpoint (Andrew Lunn) - Fix incorrect size for PCI aperture resources (Jason Gunthorpe) - Call request_resource() on the apertures (Jason Gunthorpe) - Fix potential issue in range parsing (Jean-Jacques Hiblot) Renesas R-Car - Check platform_get_irq() return code (Ben Dooks) - Add error interrupt handling (Ben Dooks) - Fix bridge logic configuration accesses (Ben Dooks) - Register each instance independently (Magnus Damm) - Break out window size handling (Magnus Damm) - Make the Kconfig dependencies more generic (Magnus Damm) Synopsys DesignWare - Fix RC BAR to be single 64-bit non-prefetchable memory (Mohit Kumar) Miscellaneous - Remove unused SR-IOV VF Migration support (Bjorn Helgaas) - Enable INTx if BIOS left them disabled (Bjorn Helgaas) - Fix hex vs decimal typo in cpqhpc_probe() (Dan Carpenter) - Clean up par-arch object file list (Liviu Dudau) - Set IORESOURCE_ROM_SHADOW only for the default VGA device (Sander Eikelenboom) - ACPI, ARM, drm, powerpc, pcmcia, PCI: Use list_for_each_entry() for bus traversal (Yijing Wang) - Fix pci_bus_b() build failure (Paul Gortmaker)" * tag 'pci-v3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (108 commits) Revert "[PATCH] Insert GART region into resource map" PCI: Log IDE resource quirk in dmesg PCI: Change pci_bus_alloc_resource() type_mask to unsigned long PCI: Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() resources: Set type in __request_region() PCI: Don't check resource_size() in pci_bus_alloc_resource() s390/PCI: Use generic pci_enable_resources() tile PCI RC: Use default pcibios_enable_device() sparc/PCI: Use default pcibios_enable_device() (Leon only) sh/PCI: Use default pcibios_enable_device() microblaze/PCI: Use default pcibios_enable_device() alpha/PCI: Use default pcibios_enable_device() PCI: Add "weak" generic pcibios_enable_device() implementation PCI: Don't enable decoding if BAR hasn't been assigned an address PCI: Enable INTx in pci_reenable_device() only when MSI/MSI-X not enabled PCI: Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit PCI: Don't try to claim IORESOURCE_UNSET resources PCI: Check IORESOURCE_UNSET before updating BAR PCI: Don't clear IORESOURCE_UNSET when updating BAR PCI: Mark resources as IORESOURCE_UNSET if we can't assign them ... Conflicts: arch/x86/include/asm/topology.h drivers/ata/ahci.c	2014-04-01 15:14:04 -07:00
Arnd Bergmann	4379d2ae15	vfio: always select ANON_INODES The vfio code cannot be built when CONFIG_ANON_INODES is disabled, so this enforces the symbol to be enabled through Kconfig. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-03-27 11:58:58 -06:00
David Rientjes	668f9abbd4	mm: close PageTail race Commit `bf6bddf192` ("mm: introduce compaction and migration for ballooned pages") introduces page_count(page) into memory compaction which dereferences page->first_page if PageTail(page). This results in a very rare NULL pointer dereference on the aforementioned page_count(page). Indeed, anything that does compound_head(), including page_count() is susceptible to racing with prep_compound_page() and seeing a NULL or dangling page->first_page pointer. This patch uses Andrea's implementation of compound_trans_head() that deals with such a race and makes it the default compound_head() implementation. This includes a read memory barrier that ensures that if PageTail(head) is true that we return a head page that is neither NULL nor dangling. The patch then adds a store memory barrier to prep_compound_page() to ensure page->first_page is set. This is the safest way to ensure we see the head page that we are expecting, PageTail(page) is already in the unlikely() path and the memory barriers are unfortunately required. Hugetlbfs is the exception, we don't enforce a store memory barrier during init since no race is possible. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Holger Kiehl <Holger.Kiehl@dwd.de> Cc: Christoph Lameter <cl@linux.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-03-04 07:55:47 -08:00
Alex Williamson	88d7ab8949	vfio: Add external user check extension interface This lets us check extensions, particularly VFIO_DMA_CC_IOMMU using the external user interface, allowing KVM to probe IOMMU coherency. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-02-26 11:38:39 -07:00
Alex Williamson	aa42931827	vfio/type1: Add extension to test DMA cache coherence of IOMMU Now that the type1 IOMMU backend can support IOMMU_CACHE, we need to be able to test whether coherency is currently enforced. Add an extension for this. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-02-26 11:38:37 -07:00
Alex Williamson	1ef3e2bc04	vfio/iommu_type1: Multi-IOMMU domain support We currently have a problem that we cannot support advanced features of an IOMMU domain (ex. IOMMU_CACHE), because we have no guarantee that those features will be supported by all of the hardware units involved with the domain over its lifetime. For instance, the Intel VT-d architecture does not require that all DRHDs support snoop control. If we create a domain based on a device behind a DRHD that does support snoop control and enable SNP support via the IOMMU_CACHE mapping option, we cannot then add a device behind a DRHD which does not support snoop control or we'll get reserved bit faults from the SNP bit in the pagetables. To add to the complexity, we can't know the properties of a domain until a device is attached. We could pass this problem off to userspace and require that a separate vfio container be used, but we don't know how to handle page accounting in that case. How do we know that a page pinned in one container is the same page as a different container and avoid double billing the user for the page. The solution is therefore to support multiple IOMMU domains per container. In the majority of cases, only one domain will be required since hardware is typically consistent within a system. However, this provides us the ability to validate compatibility of domains and support mixed environments where page table flags can be different between domains. To do this, our DMA tracking needs to change. We currently try to coalesce user mappings into as few tracking entries as possible. The problem then becomes that we lose granularity of user mappings. We've never guaranteed that a user is able to unmap at a finer granularity than the original mapping, but we must honor the granularity of the original mapping. This coalescing code is therefore removed, allowing only unmaps covering complete maps. The change in accounting is fairly small here, a typical QEMU VM will start out with roughly a dozen entries, so it's arguable if this coalescing was ever needed. We also move IOMMU domain creation to the point where a group is attached to the container. An interesting side-effect of this is that we now have access to the device at the time of domain creation and can probe the devices within the group to determine the bus_type. This finally makes vfio_iommu_type1 completely device/bus agnostic. In fact, each IOMMU domain can host devices on different buses managed by different physical IOMMUs, and present a single DMA mapping interface to the user. When a new domain is created, mappings are replayed to bring the IOMMU pagetables up to the state of the current container. And of course, DMA mapping and unmapping automatically traverse all of the configured IOMMU domains. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: Varun Sethi <Varun.Sethi@freescale.com>	2014-02-26 11:38:36 -07:00
Alexander Gordeev	94cccde648	vfio: Use pci_enable_msi_range() and pci_enable_msix_range() pci_enable_msix() and pci_enable_msi_block() have been deprecated; use pci_enable_msix_range() and pci_enable_msi_range() instead. [bhelgaas: changelog] Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2014-02-14 14:23:40 -07:00
Linus Torvalds	1b17366d69	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "So here's my next branch for powerpc. A bit late as I was on vacation last week. It's mostly the same stuff that was in next already, I just added two patches today which are the wiring up of lockref for powerpc, which for some reason fell through the cracks last time and is trivial. The highlights are, in addition to a bunch of bug fixes: - Reworked Machine Check handling on kernels running without a hypervisor (or acting as a hypervisor). Provides hooks to handle some errors in real mode such as TLB errors, handle SLB errors, etc... - Support for retrieving memory error information from the service processor on IBM servers running without a hypervisor and routing them to the memory poison infrastructure. - _PAGE_NUMA support on server processors - 32-bit BookE relocatable kernel support - FSL e6500 hardware tablewalk support - A bunch of new/revived board support - FSL e6500 deeper idle states and altivec powerdown support You'll notice a generic mm change here, it has been acked by the relevant authorities and is a pre-req for our _PAGE_NUMA support" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (121 commits) powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked() powerpc: Add support for the optimised lockref implementation powerpc/powernv: Call OPAL sync before kexec'ing powerpc/eeh: Escalate error on non-existing PE powerpc/eeh: Handle multiple EEH errors powerpc: Fix transactional FP/VMX/VSX unavailable handlers powerpc: Don't corrupt transactional state when using FP/VMX in kernel powerpc: Reclaim two unused thread_info flag bits powerpc: Fix races with irq_work Move precessing of MCE queued event out from syscall exit path. pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines powerpc: Make add_system_ram_resources() __init powerpc: add SATA_MV to ppc64_defconfig powerpc/powernv: Increase candidate fw image size powerpc: Add debug checks to catch invalid cpu-to-node mappings powerpc: Fix the setup of CPU-to-Node mappings during CPU online powerpc/iommu: Don't detach device without IOMMU group powerpc/eeh: Hotplug improvement powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config space powerpc/eeh: Add restore_config operation ...	2014-01-27 21:11:26 -08:00
Linus Torvalds	2d08cd0ef8	- Convert to misc driver to support module auto loading - Remove unnecessary and dangerous use of device_lock -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJS4T1aAAoJECObm247sIsiLMkQAKn11qSLk880dJO+cfW2VpqL vm3WreShOxNsKTEQfcbrnJtS+kYGkiyo8+Vve377QDIg+xeXpNZa0PbAEfh4HyuJ NH+vn0FYDf3rzF4Z9snm2HOHWCAsqLCE0guyMwwhUYH+fks4+YTJWCm4YWZOXbIS aI++G+5LBxGzMRbotoovqUhQy474fpQN/GFIlIZHP2bGtdeXiRuH0qH3BxVKafOF 50RtAgJsLm1Nb+S20Umsz+llL3sop1wcPLKOlgCXqd+545CJAoq/qqEPjUtOrr+S qXvXhb0XM7zo73H2WzkVmS09HZWX4sMoHBYp8D6ToOIFF01LOVsXBKduPCPpTD7E zohhjgag9/RQ99l2B153IIIv0Bsg7b4YXBQ6qkWFoJolPL94LTq1PXk0f9mNhyPo mSqatcAeI5qtjqu9ncSd4YYTdBgp7SJcgwji8fI44tvzzpz8iheVUrpwcU1UumAK TpXLBoTLXllK1op3u9xgFrF4ISBMl7lZeZp3c1/1YRga7Ch6SdlB0tcLPfuSBRF0 1Qb5jQrWz4qt/oZSkapcFALXQDgwoK8am9I0a5YXdhy9gSYm+o/TLwG5pTEnF4In xxuibmmJAmNcTJ9q6rUKjLLU/TqRKin+vyqW6is41IredgvNX8XDS3YokA5Fgv01 mu1s7odFi08xjPRuYvmq =t9+R -----END PGP SIGNATURE----- Merge tag 'vfio-v3.14-rc1' of git://github.com/awilliam/linux-vfio Pull vfio update from Alex Williamson: - convert to misc driver to support module auto loading - remove unnecessary and dangerous use of device_lock * tag 'vfio-v3.14-rc1' of git://github.com/awilliam/linux-vfio: vfio-pci: Don't use device_lock around AER interrupt setup vfio: Convert control interface to misc driver misc: Reserve minor for VFIO	2014-01-24 17:42:31 -08:00
Alex Williamson	890ed578df	vfio-pci: Use pci "try" reset interface PCI resets will attempt to take the device_lock for any device to be reset. This is a problem if that lock is already held, for instance in the device remove path. It's not sufficient to simply kill the user process or skip the reset if called after .remove as a race could result in the same deadlock. Instead, we handle all resets as "best effort" using the PCI "try" reset interfaces. This prevents the user from being able to induce a deadlock by triggering a reset. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2014-01-15 10:43:17 -07:00
Alex Williamson	3be3a074cf	vfio-pci: Don't use device_lock around AER interrupt setup device_lock is much too prone to lockups. For instance if we have a pending .remove then device_lock is already held. If userspace attempts to modify AER signaling after that point, a deadlock occurs. eventfd setup/teardown is already protected in vfio with the igate mutex. AER is not a high performance interrupt, so we can also use the same mutex to protect signaling versus setup races. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-01-14 16:12:55 -07:00
Alistair Popple	e589a4404f	powerpc/iommu: Update constant names to reflect their hardcoded page size The powerpc iommu uses a hardcoded page size of 4K. This patch changes the name of the IOMMU_PAGE_* macros to reflect the hardcoded values. A future patch will use the existing names to support dynamic page sizes. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-12-30 14:17:06 +11:00
Alex Williamson	d10999016f	vfio: Convert control interface to misc driver This change allows us to support module auto loading using devname support in userspace tools. With this, /dev/vfio/vfio will always be present and opening it will cause the vfio module to load. This should avoid needing to configure the system to statically load vfio in order to get libvirt to correctly detect support for it. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-12-19 10:17:13 -07:00
Alex Williamson	274127a1fd	PCI: Rename PCI_VC_PORT_REG1/2 to PCI_VC_PORT_CAP1/2 These are set of two capability registers, it's pretty much given that they're registers, so reflect their purpose in the name. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2013-12-17 17:49:39 -07:00
Antonios Motakis	d93b3ac0ed	VFIO: vfio_iommu_type1: fix bug caused by break in nested loop In vfio_iommu_type1.c there is a bug in vfio_dma_do_map, when checking that pages are not already mapped. Since the check is being done in a for loop nested within the main loop, breaking out of it does not create the intended behavior. If the underlying IOMMU driver returns a non-NULL value, this will be ignored and mapping the DMA range will be attempted anyway, leading to unpredictable behavior. This interracts badly with the ARM SMMU driver issue fixed in the patch that was submitted with the title: "[PATCH 2/2] ARM: SMMU: return NULL on error in arm_smmu_iova_to_phys" Both fixes are required in order to use the vfio_iommu_type1 driver with an ARM SMMU. This patch refactors the function slightly, in order to also make this kind of bug less likely. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-10-11 10:40:46 -06:00
Alex Williamson	8b27ee60bf	vfio-pci: PCI hot reset interface The current VFIO_DEVICE_RESET interface only maps to PCI use cases where we can isolate the reset to the individual PCI function. This means the device must support FLR (PCIe or AF), PM reset on D3hot->D0 transition, device specific reset, or be a singleton device on a bus for a secondary bus reset. FLR does not have widespread support, PM reset is not very reliable, and bus topology is dictated by the system and device design. We need to provide a means for a user to induce a bus reset in cases where the existing mechanisms are not available or not reliable. This device specific extension to VFIO provides the user with this ability. Two new ioctls are introduced: - VFIO_DEVICE_PCI_GET_HOT_RESET_INFO - VFIO_DEVICE_PCI_HOT_RESET The first provides the user with information about the extent of devices affected by a hot reset. This is essentially a list of devices and the IOMMU groups they belong to. The user may then initiate a hot reset by calling the second ioctl. We must be careful that the user has ownership of all the affected devices found via the first ioctl, so the second ioctl takes a list of file descriptors for the VFIO groups affected by the reset. Each group must have IOMMU protection established for the ioctl to succeed. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-09-04 11:28:04 -06:00
Alex Williamson	17638db1b8	vfio-pci: Test for extended config space Having PCIe/PCI-X capability isn't enough to assume that there are extended capabilities. Both specs define that the first capability header is all zero if there are no extended capabilities. Testing for this avoids an erroneous message about hiding capability 0x0 at offset 0x100. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-09-04 10:58:52 -06:00

1 2 3 4 5

214 Commits