OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	a68a7509d3	A handful of VFIO bug fixes for v3.16 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTkiFBAAoJECObm247sIsiymsQALDs03j1lm1ZUx3peI8JuEGp +4WrflS4eKBX2r+XOSqHQNlg5ceKpj1zsEMKXkLFBCwTEZXRz9tK7bu6hsRuE8XR j67JkBm4alsJQrJIQl0+dYJMFPVGWBNyoBc4kV2YbernBnFPc/5oOmP9wAJD80OI HHNQxOdfce+YZnkgltc/GkOWdG/YvuQbkyt2YmGtTePapzboj6Tf0yvkqRDaL0Bf 07+vYi6ZOSzx/Niw/ak09hll4OhYfgiOQ/t6LETmLi7PEGR88TYO/kqdzGmlnx6V XsDZ+WORWjtORGnyJI4p7iVFiP7mz4mmwUptfJcA0AX98csEdYHt6H4mlVkN7Gmd rSvarKQ7On8MRbymQIVw5w3V78X/HiWUsUi3Fef5xytQyuD4VxylS0HDtPM/AyeN 3zEJKOI+CIkFLZ6fReL8LKZuaRdQTbJvjxMVvmTr2XrBxZTlk91CsqT3JFXjJ6CV 2QEv2eVmsCkPtScWNZzaeQsgH75Hc8vznZxA+/BFDoeR2iq31BPE9iuYDpaVZS58 mnAEll4tPMIqcqQvxwoB8B8kyzX636trwW+585jr3gKRS8V6qHaWkXz06N5kzz7s dhRUy3rKe1SeutVo0Vs067Ym60yUlvuTeWTppuOnfRpgIpbCLeMwuiA22Wq3hBIF loXH9eAZOuwzBg1hqVWJ =aR4I -----END PGP SIGNATURE----- Merge tag 'vfio-v3.16-rc1' of git://github.com/awilliam/linux-vfio into next Pull VFIO updates from Alex Williamson: "A handful of VFIO bug fixes for v3.16" * tag 'vfio-v3.16-rc1' of git://github.com/awilliam/linux-vfio: drivers/vfio/pci: Fix wrong MSI interrupt count drivers/vfio: Rework offsetofend() vfio/iommu_type1: Avoid overflow vfio/pci: Fix unchecked return value vfio/pci: Fix sizing of DPA and THP express capabilities	2014-06-07 20:12:15 -07:00
Gavin Shan	fd49c81f08	drivers/vfio/pci: Fix wrong MSI interrupt count According PCI local bus specification, the register of Message Control for MSI (offset: 2, length: 2) has bit#0 to enable or disable MSI logic and it shouldn't be part contributing to the calculation of MSI interrupt count. The patch fixes the issue. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-05-30 11:35:54 -06:00
Alex Williamson	c8dbca165b	vfio/iommu_type1: Avoid overflow Coverity reports use of a tained scalar used as a loop boundary. For the most part, any values passed from userspace for a DMA mapping size, IOVA, or virtual address are valid, with some alignment constraints. The size is ultimately bound by how many pages the user is able to lock, IOVA is tested by the IOMMU driver when doing a map, and the virtual address needs to pass get_user_pages. The only problem I can find is that we do expect the __u64 user values to fit within our variables, which might not happen on 32bit platforms. Add a test for this and return error on overflow. Also propagate use of the type-correct local variables throughout the function. The above also points to the 'end' variable, which can be zero if we're operating at the very top of the address space. We try to account for this, but our loop botches it. Rework the loop to use the remaining size as our loop condition rather than the IOVA vs end. Detected by Coverity: CID 714659 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-05-30 11:35:54 -06:00
Alex Williamson	eb5685f0de	vfio/pci: Fix unchecked return value There's nothing we can do different if pci_load_and_free_saved_state() fails, other than maybe print some log message, but the actual re-load of the state is an unnecessary step here since we've only just saved it. We can cleanup a coverity warning and eliminate the unnecessary step by freeing the state ourselves. Detected by Coverity: CID 753101 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-05-30 11:35:53 -06:00
Alex Williamson	afa63252b2	vfio/pci: Fix sizing of DPA and THP express capabilities When sizing the TPH capability we store the register containing the table size into the 'dword' variable, but then use the uninitialized 'byte' variable to analyze the size. The table size is also actually reported as an N-1 value, so correct sizing to account for this. The round_up() for both TPH and DPA is unnecessary, remove it. Detected by Coverity: CID 714665 & 715156 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-05-30 10:50:31 -06:00
Jean Delvare	8283b4919e	driver core: dev_set_drvdata can no longer fail So there is no point in checking its return value, which will soon disappear. Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-05-27 13:40:51 -07:00
Linus Torvalds	d0cb5f71c5	VFIO updates for v3.15 include: - Allow the vfio-type1 IOMMU to support multiple domains within a container - Plumb path to query whether all domains are cache-coherent - Wire query into kvm-vfio device to avoid KVM x86 WBINVD emulation - Always select CONFIG_ANON_INODES, vfio depends on it (Arnd) The first patch also makes the vfio-type1 IOMMU driver completely independent of the bus_type of the devices it's handling, which enables it to be used for both vfio-pci and a future vfio-platform (and hopefully combinations involving both simultaneously). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTPbikAAoJECObm247sIsiG9cP/jnurW84YuAHmybzy3R4nMaa 8BWcQST+TyQ78GWvubFDcRt+vHmJUI4iFWPBIm1twBSVrMy6F1GcT/spmSne1Dfb OvcfGEy59tGO+BeklHg5Qq2Hj1UzfeV9uQoy+PrpTF/sYzsyL6g5+O3Zv39SNvr0 zCwnO2JAKXIpQlkK3wVha6V13X07Z/+d2b4JSsvmON2cnlPzOUYNrgBvoSeV2IQe 3kMAU9qfAfQlpNDhRRZL/HshgWazCg4XZUp9UdNjUkOxrUp4vXHrVTUJnUGBDytk 8V9UGRKF3mik9PqpZJk4jLV5urgUVpnUR5747uqs0KF+9GWxClXvh3gp1XX8Zn7f NCfDSMn/wrDr6fQBglKUadITaDhF49KV+J0cX083q46BbYxhAfgxv1I9UOvLreG6 SGAezlTB0mefa1BmSwJdfKwDBsuDOhbF0TsHC6zT7yFc8pqe3hl/vQzUyu3HtwuM yPycQiQQmvOaymaiiBRwyLocwJbDNKPigDomv8NZ8Mcd97Fu9YsF4ydaxMJmmeKf TffYS4z4tUElYiU2vv1ipB8T+ReCmdtj2cIvyfwTL9jycY9ouO4bJ56dOGWd3WNl m7AVokbrrJQ03itUpFMuYjHAzTNUlXVvXlGr9n6L4VTR0RTL1zwJVrMVDsXdhxm4 XgDbVB5PrxinDAC1vJvI =mnzO -----END PGP SIGNATURE----- Merge tag 'vfio-v3.15-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: "VFIO updates for v3.15 include: - Allow the vfio-type1 IOMMU to support multiple domains within a container - Plumb path to query whether all domains are cache-coherent - Wire query into kvm-vfio device to avoid KVM x86 WBINVD emulation - Always select CONFIG_ANON_INODES, vfio depends on it (Arnd) The first patch also makes the vfio-type1 IOMMU driver completely independent of the bus_type of the devices it's handling, which enables it to be used for both vfio-pci and a future vfio-platform (and hopefully combinations involving both simultaneously)" * tag 'vfio-v3.15-rc1' of git://github.com/awilliam/linux-vfio: vfio: always select ANON_INODES kvm/vfio: Support for DMA coherent IOMMUs vfio: Add external user check extension interface vfio/type1: Add extension to test DMA cache coherence of IOMMU vfio/iommu_type1: Multi-IOMMU domain support	2014-04-03 14:05:02 -07:00
Linus Torvalds	4b1779c2cf	PCI changes for the v3.15 merge window: Enumeration - Increment max correctly in pci_scan_bridge() (Andreas Noever) - Clarify the "scan anyway" comment in pci_scan_bridge() (Andreas Noever) - Assign CardBus bus number only during the second pass (Andreas Noever) - Use request_resource_conflict() instead of insert_ for bus numbers (Andreas Noever) - Make sure bus number resources stay within their parents bounds (Andreas Noever) - Remove pci_fixup_parent_subordinate_busnr() (Andreas Noever) - Check for child busses which use more bus numbers than allocated (Andreas Noever) - Don't scan random busses in pci_scan_bridge() (Andreas Noever) - x86: Drop pcibios_scan_root() check for bus already scanned (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata() (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_on_node() (Bjorn Helgaas) - x86: Merge pci_scan_bus_on_node() into pcibios_scan_root() (Bjorn Helgaas) - x86: Drop return value of pcibios_scan_root() (Bjorn Helgaas) NUMA - x86: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus (Bjorn Helgaas) - x86: Use x86_pci_root_bus_node() instead of get_mp_bus_to_node() (Bjorn Helgaas) - x86: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node() (Bjorn Helgaas) - x86: Use NUMA_NO_NODE, not -1, for unknown node (Bjorn Helgaas) - x86: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ia64: Use NUMA_NO_NODE, not MAX_NUMNODES, for unknown node (Bjorn Helgaas) - ia64: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ACPI: Fix acpi_get_node() prototype (Bjorn Helgaas) Resource management - i2o: Fix and refactor PCI space allocation (Bjorn Helgaas) - Add resource_contains() (Bjorn Helgaas) - Add %pR support for IORESOURCE_UNSET (Bjorn Helgaas) - Mark resources as IORESOURCE_UNSET if we can't assign them (Bjorn Helgaas) - Don't clear IORESOURCE_UNSET when updating BAR (Bjorn Helgaas) - Check IORESOURCE_UNSET before updating BAR (Bjorn Helgaas) - Don't try to claim IORESOURCE_UNSET resources (Bjorn Helgaas) - Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit (Bjorn Helgaas) - Don't enable decoding if BAR hasn't been assigned an address (Bjorn Helgaas) - Add "weak" generic pcibios_enable_device() implementation (Bjorn Helgaas) - alpha, microblaze, sh, sparc, tile: Use default pcibios_enable_device() (Bjorn Helgaas) - s390: Use generic pci_enable_resources() (Bjorn Helgaas) - Don't check resource_size() in pci_bus_alloc_resource() (Bjorn Helgaas) - Set type in __request_region() (Bjorn Helgaas) - Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() (Bjorn Helgaas) - Change pci_bus_alloc_resource() type_mask to unsigned long (Bjorn Helgaas) - Log IDE resource quirk in dmesg (Bjorn Helgaas) - Revert "[PATCH] Insert GART region into resource map" (Bjorn Helgaas) PCI device hotplug - Make check_link_active() non-static (Rajat Jain) - Use link change notifications for hot-plug and removal (Rajat Jain) - Enable link state change notifications (Rajat Jain) - Don't disable the link permanently during removal (Rajat Jain) - Don't check adapter or latch status while disabling (Rajat Jain) - Disable link notification across slot reset (Rajat Jain) - Ensure very fast hotplug events are also processed (Rajat Jain) - Add hotplug_lock to serialize hotplug events (Rajat Jain) - Remove a non-existent card, regardless of "surprise" capability (Rajat Jain) - Don't turn slot off when hot-added device already exists (Yijing Wang) MSI - Keep pci_enable_msi() documentation (Alexander Gordeev) - ahci: Fix broken single MSI fallback (Alexander Gordeev) - ahci, vfio: Use pci_enable_msi_range() (Alexander Gordeev) - Check kmalloc() return value, fix leak of name (Greg Kroah-Hartman) - Fix leak of msi_attrs (Greg Kroah-Hartman) - Fix pci_msix_vec_count() htmldocs failure (Masanari Iida) Virtualization - Device-specific ACS support (Alex Williamson) Freescale i.MX6 - Wait for retraining (Marek Vasut) Marvell MVEBU - Use Device ID and revision from underlying endpoint (Andrew Lunn) - Fix incorrect size for PCI aperture resources (Jason Gunthorpe) - Call request_resource() on the apertures (Jason Gunthorpe) - Fix potential issue in range parsing (Jean-Jacques Hiblot) Renesas R-Car - Check platform_get_irq() return code (Ben Dooks) - Add error interrupt handling (Ben Dooks) - Fix bridge logic configuration accesses (Ben Dooks) - Register each instance independently (Magnus Damm) - Break out window size handling (Magnus Damm) - Make the Kconfig dependencies more generic (Magnus Damm) Synopsys DesignWare - Fix RC BAR to be single 64-bit non-prefetchable memory (Mohit Kumar) Miscellaneous - Remove unused SR-IOV VF Migration support (Bjorn Helgaas) - Enable INTx if BIOS left them disabled (Bjorn Helgaas) - Fix hex vs decimal typo in cpqhpc_probe() (Dan Carpenter) - Clean up par-arch object file list (Liviu Dudau) - Set IORESOURCE_ROM_SHADOW only for the default VGA device (Sander Eikelenboom) - ACPI, ARM, drm, powerpc, pcmcia, PCI: Use list_for_each_entry() for bus traversal (Yijing Wang) - Fix pci_bus_b() build failure (Paul Gortmaker) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJTOdAZAAoJEFmIoMA60/r8VYUQALRrReyMBk3pjRt/fKIX4Kwi ydSo/YJeeKTN8K93fLw8bb8bdPItJScJFTfEa4Q2SpZezR/ecGXLowisy0BBaPHK qtOyB8EqjkLS17GfyecIe9Nd2SIAI2De/0bchK3kDtIX1YlZB/k/tD3eCPMHDnnl m8c5kAHKPQYd8g01I+S8nrtGHk/A33grfYpJXPZbcqyhE0lWU3SI8KDAGbcKzNHE 23Do0yNyd4nHIdixWlhETcNvzHn35Q/O38JJwW9Mf1aI9gusYuml6GFefCgu/iov lxqp3CEW7iPZgQEgNbrQ0HzWn/durL2Trd6S/Yh6f2xbm1LGYKWh3LZUFLd3AQDd INEpUgKsyb//nF3dtiyGnZlp0QykoqFyLo2AEDrb+ILTd4up5DeRY/m1UpjAXR5p QicBmrDksHrSivPmMZwLx1DFQYKjQbdx5lOqy9hQM/Jmsr+N3/l7QBrbQWXks3JZ NNAyn4RZHQB7UDQS/MmVPArs+JK5qaEDQD57QuOTlqgP19VY9C9E/l/aEqefjdFo XOAm7CwGpB/iBAkIbE6ROEDiJArigRVHEfxLYeE/jtGOdRDCD1deWk+g3S8DWD7m ZxWSgIVB00PMAmomczdg59YVFBhocgwPUa8/cw6yqzx2QKP4mWXIFZ/Sjau5I3tn WWoxXlUirZfTJc29XnVy =3mNS -----END PGP SIGNATURE----- Merge tag 'pci-v3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Enumeration - Increment max correctly in pci_scan_bridge() (Andreas Noever) - Clarify the "scan anyway" comment in pci_scan_bridge() (Andreas Noever) - Assign CardBus bus number only during the second pass (Andreas Noever) - Use request_resource_conflict() instead of insert_ for bus numbers (Andreas Noever) - Make sure bus number resources stay within their parents bounds (Andreas Noever) - Remove pci_fixup_parent_subordinate_busnr() (Andreas Noever) - Check for child busses which use more bus numbers than allocated (Andreas Noever) - Don't scan random busses in pci_scan_bridge() (Andreas Noever) - x86: Drop pcibios_scan_root() check for bus already scanned (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata() (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_on_node() (Bjorn Helgaas) - x86: Merge pci_scan_bus_on_node() into pcibios_scan_root() (Bjorn Helgaas) - x86: Drop return value of pcibios_scan_root() (Bjorn Helgaas) NUMA - x86: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus (Bjorn Helgaas) - x86: Use x86_pci_root_bus_node() instead of get_mp_bus_to_node() (Bjorn Helgaas) - x86: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node() (Bjorn Helgaas) - x86: Use NUMA_NO_NODE, not -1, for unknown node (Bjorn Helgaas) - x86: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ia64: Use NUMA_NO_NODE, not MAX_NUMNODES, for unknown node (Bjorn Helgaas) - ia64: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ACPI: Fix acpi_get_node() prototype (Bjorn Helgaas) Resource management - i2o: Fix and refactor PCI space allocation (Bjorn Helgaas) - Add resource_contains() (Bjorn Helgaas) - Add %pR support for IORESOURCE_UNSET (Bjorn Helgaas) - Mark resources as IORESOURCE_UNSET if we can't assign them (Bjorn Helgaas) - Don't clear IORESOURCE_UNSET when updating BAR (Bjorn Helgaas) - Check IORESOURCE_UNSET before updating BAR (Bjorn Helgaas) - Don't try to claim IORESOURCE_UNSET resources (Bjorn Helgaas) - Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit (Bjorn Helgaas) - Don't enable decoding if BAR hasn't been assigned an address (Bjorn Helgaas) - Add "weak" generic pcibios_enable_device() implementation (Bjorn Helgaas) - alpha, microblaze, sh, sparc, tile: Use default pcibios_enable_device() (Bjorn Helgaas) - s390: Use generic pci_enable_resources() (Bjorn Helgaas) - Don't check resource_size() in pci_bus_alloc_resource() (Bjorn Helgaas) - Set type in __request_region() (Bjorn Helgaas) - Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() (Bjorn Helgaas) - Change pci_bus_alloc_resource() type_mask to unsigned long (Bjorn Helgaas) - Log IDE resource quirk in dmesg (Bjorn Helgaas) - Revert "[PATCH] Insert GART region into resource map" (Bjorn Helgaas) PCI device hotplug - Make check_link_active() non-static (Rajat Jain) - Use link change notifications for hot-plug and removal (Rajat Jain) - Enable link state change notifications (Rajat Jain) - Don't disable the link permanently during removal (Rajat Jain) - Don't check adapter or latch status while disabling (Rajat Jain) - Disable link notification across slot reset (Rajat Jain) - Ensure very fast hotplug events are also processed (Rajat Jain) - Add hotplug_lock to serialize hotplug events (Rajat Jain) - Remove a non-existent card, regardless of "surprise" capability (Rajat Jain) - Don't turn slot off when hot-added device already exists (Yijing Wang) MSI - Keep pci_enable_msi() documentation (Alexander Gordeev) - ahci: Fix broken single MSI fallback (Alexander Gordeev) - ahci, vfio: Use pci_enable_msi_range() (Alexander Gordeev) - Check kmalloc() return value, fix leak of name (Greg Kroah-Hartman) - Fix leak of msi_attrs (Greg Kroah-Hartman) - Fix pci_msix_vec_count() htmldocs failure (Masanari Iida) Virtualization - Device-specific ACS support (Alex Williamson) Freescale i.MX6 - Wait for retraining (Marek Vasut) Marvell MVEBU - Use Device ID and revision from underlying endpoint (Andrew Lunn) - Fix incorrect size for PCI aperture resources (Jason Gunthorpe) - Call request_resource() on the apertures (Jason Gunthorpe) - Fix potential issue in range parsing (Jean-Jacques Hiblot) Renesas R-Car - Check platform_get_irq() return code (Ben Dooks) - Add error interrupt handling (Ben Dooks) - Fix bridge logic configuration accesses (Ben Dooks) - Register each instance independently (Magnus Damm) - Break out window size handling (Magnus Damm) - Make the Kconfig dependencies more generic (Magnus Damm) Synopsys DesignWare - Fix RC BAR to be single 64-bit non-prefetchable memory (Mohit Kumar) Miscellaneous - Remove unused SR-IOV VF Migration support (Bjorn Helgaas) - Enable INTx if BIOS left them disabled (Bjorn Helgaas) - Fix hex vs decimal typo in cpqhpc_probe() (Dan Carpenter) - Clean up par-arch object file list (Liviu Dudau) - Set IORESOURCE_ROM_SHADOW only for the default VGA device (Sander Eikelenboom) - ACPI, ARM, drm, powerpc, pcmcia, PCI: Use list_for_each_entry() for bus traversal (Yijing Wang) - Fix pci_bus_b() build failure (Paul Gortmaker)" * tag 'pci-v3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (108 commits) Revert "[PATCH] Insert GART region into resource map" PCI: Log IDE resource quirk in dmesg PCI: Change pci_bus_alloc_resource() type_mask to unsigned long PCI: Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() resources: Set type in __request_region() PCI: Don't check resource_size() in pci_bus_alloc_resource() s390/PCI: Use generic pci_enable_resources() tile PCI RC: Use default pcibios_enable_device() sparc/PCI: Use default pcibios_enable_device() (Leon only) sh/PCI: Use default pcibios_enable_device() microblaze/PCI: Use default pcibios_enable_device() alpha/PCI: Use default pcibios_enable_device() PCI: Add "weak" generic pcibios_enable_device() implementation PCI: Don't enable decoding if BAR hasn't been assigned an address PCI: Enable INTx in pci_reenable_device() only when MSI/MSI-X not enabled PCI: Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit PCI: Don't try to claim IORESOURCE_UNSET resources PCI: Check IORESOURCE_UNSET before updating BAR PCI: Don't clear IORESOURCE_UNSET when updating BAR PCI: Mark resources as IORESOURCE_UNSET if we can't assign them ... Conflicts: arch/x86/include/asm/topology.h drivers/ata/ahci.c	2014-04-01 15:14:04 -07:00
Arnd Bergmann	4379d2ae15	vfio: always select ANON_INODES The vfio code cannot be built when CONFIG_ANON_INODES is disabled, so this enforces the symbol to be enabled through Kconfig. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-03-27 11:58:58 -06:00
David Rientjes	668f9abbd4	mm: close PageTail race Commit `bf6bddf192` ("mm: introduce compaction and migration for ballooned pages") introduces page_count(page) into memory compaction which dereferences page->first_page if PageTail(page). This results in a very rare NULL pointer dereference on the aforementioned page_count(page). Indeed, anything that does compound_head(), including page_count() is susceptible to racing with prep_compound_page() and seeing a NULL or dangling page->first_page pointer. This patch uses Andrea's implementation of compound_trans_head() that deals with such a race and makes it the default compound_head() implementation. This includes a read memory barrier that ensures that if PageTail(head) is true that we return a head page that is neither NULL nor dangling. The patch then adds a store memory barrier to prep_compound_page() to ensure page->first_page is set. This is the safest way to ensure we see the head page that we are expecting, PageTail(page) is already in the unlikely() path and the memory barriers are unfortunately required. Hugetlbfs is the exception, we don't enforce a store memory barrier during init since no race is possible. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Holger Kiehl <Holger.Kiehl@dwd.de> Cc: Christoph Lameter <cl@linux.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-03-04 07:55:47 -08:00
Alex Williamson	88d7ab8949	vfio: Add external user check extension interface This lets us check extensions, particularly VFIO_DMA_CC_IOMMU using the external user interface, allowing KVM to probe IOMMU coherency. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-02-26 11:38:39 -07:00
Alex Williamson	aa42931827	vfio/type1: Add extension to test DMA cache coherence of IOMMU Now that the type1 IOMMU backend can support IOMMU_CACHE, we need to be able to test whether coherency is currently enforced. Add an extension for this. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-02-26 11:38:37 -07:00
Alex Williamson	1ef3e2bc04	vfio/iommu_type1: Multi-IOMMU domain support We currently have a problem that we cannot support advanced features of an IOMMU domain (ex. IOMMU_CACHE), because we have no guarantee that those features will be supported by all of the hardware units involved with the domain over its lifetime. For instance, the Intel VT-d architecture does not require that all DRHDs support snoop control. If we create a domain based on a device behind a DRHD that does support snoop control and enable SNP support via the IOMMU_CACHE mapping option, we cannot then add a device behind a DRHD which does not support snoop control or we'll get reserved bit faults from the SNP bit in the pagetables. To add to the complexity, we can't know the properties of a domain until a device is attached. We could pass this problem off to userspace and require that a separate vfio container be used, but we don't know how to handle page accounting in that case. How do we know that a page pinned in one container is the same page as a different container and avoid double billing the user for the page. The solution is therefore to support multiple IOMMU domains per container. In the majority of cases, only one domain will be required since hardware is typically consistent within a system. However, this provides us the ability to validate compatibility of domains and support mixed environments where page table flags can be different between domains. To do this, our DMA tracking needs to change. We currently try to coalesce user mappings into as few tracking entries as possible. The problem then becomes that we lose granularity of user mappings. We've never guaranteed that a user is able to unmap at a finer granularity than the original mapping, but we must honor the granularity of the original mapping. This coalescing code is therefore removed, allowing only unmaps covering complete maps. The change in accounting is fairly small here, a typical QEMU VM will start out with roughly a dozen entries, so it's arguable if this coalescing was ever needed. We also move IOMMU domain creation to the point where a group is attached to the container. An interesting side-effect of this is that we now have access to the device at the time of domain creation and can probe the devices within the group to determine the bus_type. This finally makes vfio_iommu_type1 completely device/bus agnostic. In fact, each IOMMU domain can host devices on different buses managed by different physical IOMMUs, and present a single DMA mapping interface to the user. When a new domain is created, mappings are replayed to bring the IOMMU pagetables up to the state of the current container. And of course, DMA mapping and unmapping automatically traverse all of the configured IOMMU domains. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: Varun Sethi <Varun.Sethi@freescale.com>	2014-02-26 11:38:36 -07:00
Alexander Gordeev	94cccde648	vfio: Use pci_enable_msi_range() and pci_enable_msix_range() pci_enable_msix() and pci_enable_msi_block() have been deprecated; use pci_enable_msix_range() and pci_enable_msi_range() instead. [bhelgaas: changelog] Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2014-02-14 14:23:40 -07:00
Linus Torvalds	1b17366d69	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "So here's my next branch for powerpc. A bit late as I was on vacation last week. It's mostly the same stuff that was in next already, I just added two patches today which are the wiring up of lockref for powerpc, which for some reason fell through the cracks last time and is trivial. The highlights are, in addition to a bunch of bug fixes: - Reworked Machine Check handling on kernels running without a hypervisor (or acting as a hypervisor). Provides hooks to handle some errors in real mode such as TLB errors, handle SLB errors, etc... - Support for retrieving memory error information from the service processor on IBM servers running without a hypervisor and routing them to the memory poison infrastructure. - _PAGE_NUMA support on server processors - 32-bit BookE relocatable kernel support - FSL e6500 hardware tablewalk support - A bunch of new/revived board support - FSL e6500 deeper idle states and altivec powerdown support You'll notice a generic mm change here, it has been acked by the relevant authorities and is a pre-req for our _PAGE_NUMA support" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (121 commits) powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked() powerpc: Add support for the optimised lockref implementation powerpc/powernv: Call OPAL sync before kexec'ing powerpc/eeh: Escalate error on non-existing PE powerpc/eeh: Handle multiple EEH errors powerpc: Fix transactional FP/VMX/VSX unavailable handlers powerpc: Don't corrupt transactional state when using FP/VMX in kernel powerpc: Reclaim two unused thread_info flag bits powerpc: Fix races with irq_work Move precessing of MCE queued event out from syscall exit path. pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines powerpc: Make add_system_ram_resources() __init powerpc: add SATA_MV to ppc64_defconfig powerpc/powernv: Increase candidate fw image size powerpc: Add debug checks to catch invalid cpu-to-node mappings powerpc: Fix the setup of CPU-to-Node mappings during CPU online powerpc/iommu: Don't detach device without IOMMU group powerpc/eeh: Hotplug improvement powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config space powerpc/eeh: Add restore_config operation ...	2014-01-27 21:11:26 -08:00
Linus Torvalds	2d08cd0ef8	- Convert to misc driver to support module auto loading - Remove unnecessary and dangerous use of device_lock -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJS4T1aAAoJECObm247sIsiLMkQAKn11qSLk880dJO+cfW2VpqL vm3WreShOxNsKTEQfcbrnJtS+kYGkiyo8+Vve377QDIg+xeXpNZa0PbAEfh4HyuJ NH+vn0FYDf3rzF4Z9snm2HOHWCAsqLCE0guyMwwhUYH+fks4+YTJWCm4YWZOXbIS aI++G+5LBxGzMRbotoovqUhQy474fpQN/GFIlIZHP2bGtdeXiRuH0qH3BxVKafOF 50RtAgJsLm1Nb+S20Umsz+llL3sop1wcPLKOlgCXqd+545CJAoq/qqEPjUtOrr+S qXvXhb0XM7zo73H2WzkVmS09HZWX4sMoHBYp8D6ToOIFF01LOVsXBKduPCPpTD7E zohhjgag9/RQ99l2B153IIIv0Bsg7b4YXBQ6qkWFoJolPL94LTq1PXk0f9mNhyPo mSqatcAeI5qtjqu9ncSd4YYTdBgp7SJcgwji8fI44tvzzpz8iheVUrpwcU1UumAK TpXLBoTLXllK1op3u9xgFrF4ISBMl7lZeZp3c1/1YRga7Ch6SdlB0tcLPfuSBRF0 1Qb5jQrWz4qt/oZSkapcFALXQDgwoK8am9I0a5YXdhy9gSYm+o/TLwG5pTEnF4In xxuibmmJAmNcTJ9q6rUKjLLU/TqRKin+vyqW6is41IredgvNX8XDS3YokA5Fgv01 mu1s7odFi08xjPRuYvmq =t9+R -----END PGP SIGNATURE----- Merge tag 'vfio-v3.14-rc1' of git://github.com/awilliam/linux-vfio Pull vfio update from Alex Williamson: - convert to misc driver to support module auto loading - remove unnecessary and dangerous use of device_lock * tag 'vfio-v3.14-rc1' of git://github.com/awilliam/linux-vfio: vfio-pci: Don't use device_lock around AER interrupt setup vfio: Convert control interface to misc driver misc: Reserve minor for VFIO	2014-01-24 17:42:31 -08:00
Alex Williamson	890ed578df	vfio-pci: Use pci "try" reset interface PCI resets will attempt to take the device_lock for any device to be reset. This is a problem if that lock is already held, for instance in the device remove path. It's not sufficient to simply kill the user process or skip the reset if called after .remove as a race could result in the same deadlock. Instead, we handle all resets as "best effort" using the PCI "try" reset interfaces. This prevents the user from being able to induce a deadlock by triggering a reset. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2014-01-15 10:43:17 -07:00
Alex Williamson	3be3a074cf	vfio-pci: Don't use device_lock around AER interrupt setup device_lock is much too prone to lockups. For instance if we have a pending .remove then device_lock is already held. If userspace attempts to modify AER signaling after that point, a deadlock occurs. eventfd setup/teardown is already protected in vfio with the igate mutex. AER is not a high performance interrupt, so we can also use the same mutex to protect signaling versus setup races. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-01-14 16:12:55 -07:00
Alistair Popple	e589a4404f	powerpc/iommu: Update constant names to reflect their hardcoded page size The powerpc iommu uses a hardcoded page size of 4K. This patch changes the name of the IOMMU_PAGE_* macros to reflect the hardcoded values. A future patch will use the existing names to support dynamic page sizes. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-12-30 14:17:06 +11:00
Alex Williamson	d10999016f	vfio: Convert control interface to misc driver This change allows us to support module auto loading using devname support in userspace tools. With this, /dev/vfio/vfio will always be present and opening it will cause the vfio module to load. This should avoid needing to configure the system to statically load vfio in order to get libvirt to correctly detect support for it. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-12-19 10:17:13 -07:00
Alex Williamson	274127a1fd	PCI: Rename PCI_VC_PORT_REG1/2 to PCI_VC_PORT_CAP1/2 These are set of two capability registers, it's pretty much given that they're registers, so reflect their purpose in the name. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2013-12-17 17:49:39 -07:00
Antonios Motakis	d93b3ac0ed	VFIO: vfio_iommu_type1: fix bug caused by break in nested loop In vfio_iommu_type1.c there is a bug in vfio_dma_do_map, when checking that pages are not already mapped. Since the check is being done in a for loop nested within the main loop, breaking out of it does not create the intended behavior. If the underlying IOMMU driver returns a non-NULL value, this will be ignored and mapping the DMA range will be attempted anyway, leading to unpredictable behavior. This interracts badly with the ARM SMMU driver issue fixed in the patch that was submitted with the title: "[PATCH 2/2] ARM: SMMU: return NULL on error in arm_smmu_iova_to_phys" Both fixes are required in order to use the vfio_iommu_type1 driver with an ARM SMMU. This patch refactors the function slightly, in order to also make this kind of bug less likely. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-10-11 10:40:46 -06:00
Alex Williamson	8b27ee60bf	vfio-pci: PCI hot reset interface The current VFIO_DEVICE_RESET interface only maps to PCI use cases where we can isolate the reset to the individual PCI function. This means the device must support FLR (PCIe or AF), PM reset on D3hot->D0 transition, device specific reset, or be a singleton device on a bus for a secondary bus reset. FLR does not have widespread support, PM reset is not very reliable, and bus topology is dictated by the system and device design. We need to provide a means for a user to induce a bus reset in cases where the existing mechanisms are not available or not reliable. This device specific extension to VFIO provides the user with this ability. Two new ioctls are introduced: - VFIO_DEVICE_PCI_GET_HOT_RESET_INFO - VFIO_DEVICE_PCI_HOT_RESET The first provides the user with information about the extent of devices affected by a hot reset. This is essentially a list of devices and the IOMMU groups they belong to. The user may then initiate a hot reset by calling the second ioctl. We must be careful that the user has ownership of all the affected devices found via the first ioctl, so the second ioctl takes a list of file descriptors for the VFIO groups affected by the reset. Each group must have IOMMU protection established for the ioctl to succeed. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-09-04 11:28:04 -06:00
Alex Williamson	17638db1b8	vfio-pci: Test for extended config space Having PCIe/PCI-X capability isn't enough to assume that there are extended capabilities. Both specs define that the first capability header is all zero if there are no extended capabilities. Testing for this avoids an erroneous message about hiding capability 0x0 at offset 0x100. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-09-04 10:58:52 -06:00
Alex Williamson	20e7745784	vfio-pci: Use fdget() rather than eventfd_fget() eventfd_fget() tests to see whether the file is an eventfd file, which we then immediately pass to eventfd_ctx_fileget(), which again tests whether the file is an eventfd file. Simplify slightly by using fdget() so that we only test that we're looking at an eventfd once. fget() could also be used, but fdget() makes use of fget_light() for another slight optimization. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-08-28 09:49:55 -06:00
Alex Williamson	5d042fbdbb	vfio: Add O_CLOEXEC flag to vfio device fd Add the default O_CLOEXEC flag for device file descriptors. This is generally considered a safer option as it allows the user a race free option to decide whether file descriptors are inherited across exec, with the default avoiding file descriptor leaks. Reported-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-08-22 10:33:41 -06:00
Yann Droneaud	a5d550703d	vfio: use get_unused_fd_flags(0) instead of get_unused_fd() Macro get_unused_fd() is used to allocate a file descriptor with default flags. Those default flags (0) can be "unsafe": O_CLOEXEC must be used by default to not leak file descriptor across exec(). Instead of macro get_unused_fd(), functions anon_inode_getfd() or get_unused_fd_flags() should be used with flags given by userspace. If not possible, flags should be set to O_CLOEXEC to provide userspace with a default safe behavor. In a further patch, get_unused_fd() will be removed so that new code start using anon_inode_getfd() or get_unused_fd_flags() with correct flags. This patch replaces calls to get_unused_fd() with equivalent call to get_unused_fd_flags(0) to preserve current behavor for existing code. The hard coded flag value (0) should be reviewed on a per-subsystem basis, and, if possible, set to O_CLOEXEC. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://lkml.kernel.org/r/cover.1376327678.git.ydroneaud@opteya.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-08-22 10:20:05 -06:00
Alexey Kardashevskiy	6cdd978213	vfio: add external user support VFIO is designed to be used via ioctls on file descriptors returned by VFIO. However in some situations support for an external user is required. The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to use the existing VFIO groups for exclusive access in real/virtual mode on a host to avoid passing map/unmap requests to the user space which would made things pretty slow. The protocol includes: 1. do normal VFIO init operation: - opening a new container; - attaching group(s) to it; - setting an IOMMU driver for a container. When IOMMU is set for a container, all groups in it are considered ready to use by an external user. 2. User space passes a group fd to an external user. The external user calls vfio_group_get_external_user() to verify that: - the group is initialized; - IOMMU is set for it. If both checks passed, vfio_group_get_external_user() increments the container user counter to prevent the VFIO group from disposal before KVM exits. 3. The external user calls vfio_external_user_iommu_id() to know an IOMMU ID. PPC64 KVM uses it to link logical bus number (LIOBN) with IOMMU ID. 4. When the external KVM finishes, it calls vfio_group_put_external_user() to release the VFIO group. This call decrements the container user counter. Everything gets released. The "vfio: Limit group opens" patch is also required for the consistency. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-08-05 10:52:36 -06:00
Alex Williamson	d24cdbfd28	vfio-pci: Avoid deadlock on remove If an attempt is made to unbind a device from vfio-pci while that device is in use, the request is blocked until the device becomes unused. Unfortunately, that unbind path still grabs the device_lock, which certain things like __pci_reset_function() also want to take. This means we need to try to acquire the locks ourselves and use the pre-locked version, __pci_reset_function_locked(). Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-07-24 16:36:41 -06:00
Alex Williamson	c64019302b	vfio: Ignore sprurious notifies Remove debugging WARN_ON if we get a spurious notify for a group that no longer exists. No reports of anyone hitting this, but it would likely be a race and not a bug if they did. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-07-24 16:36:40 -06:00
Alex Williamson	de9c7602ca	vfio: Don't overreact to DEL_DEVICE BUS_NOTIFY_DEL_DEVICE triggers IOMMU drivers to remove devices from their iommu group, but there's really nothing we can do about it at this point. If the device is in use, then the vfio sub-driver will block the device_del from completing until it's released. If the device is not in use or not owned by a vfio sub-driver, then we really don't care that it's being removed. The current code can be triggered just by unloading an sr-iov driver (ex. igb) while the VFs are attached to vfio-pci because it makes an incorrect assumption about the ordering of driver remove callbacks vs the DEL_DEVICE notification. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-07-24 16:36:00 -06:00
Linus Torvalds	15a49b9a90	vfio Updates for v3.11 Largely hugepage support for vfio/type1 iommu and surrounding cleanups and fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQIcBAABAgAGBQJR2uNvAAoJECObm247sIsiJRYQAJK15MfXgJq2PtBABNvFUAOG nqUvLgBgM5Ow1NI0Rzh9jkNohNqCvXDFGaWXXnsaX83hIpi59GFK31W2E3SiFCj3 xISA9SUnm7Kjt9LAF6HTNz805zBkshIOk4MCx6HlezVWSRlWwT3rZzI4dI2fMvl8 iPRk1Ion3QSQui99HWfXv/rtezAIzgZqsFqPC6DjWRfN7LcdEtKtcQwnrSb5GGY9 3TIRY9IRYTSfJ2yjSz5f5258JxoDG5sR8dTMkgG2Gm92iGvGcPGpzQWPzVc4t+TO PdTqtv9ftEyAJKsYTFjPIod8XbzJBa1FSPadVAIfwF0JCDcsSFjoWGp+RzMQQSF8 MK3VsnQ/pqJfs2nJHDQbWbKu0qWYPntvOCdojZ4679ceDTd0t515npfYeDQuX8yU fAA5rB46mDXjyxikTP574NdnkcGjbAj7EOCp7s+WTsVPGQQ3mId/3fQw0Wg7bE6v jaJqdRj70SNTRHs8DFLQhvSZgpef4RzepE4sRBZqzY4vWd4riNcAC3Got+F2rQy3 X4hcHHU/5LGLoGMxOJQmuBfKVM8RAgikq6w2RfttVMLeKCknKtJ29OnotKilvILh W8nAOGxRnkmONFfHakNJtLl5tQJ4FQXc2cG8OeIIhHgheJjUxL72/zv8bBxOo7rY jUBjtZ5riQXc/ck4FEGI =9+Jh -----END PGP SIGNATURE----- Merge tag 'vfio-v3.11' of git://github.com/awilliam/linux-vfio Pull vfio updates from Alex Williamson: "Largely hugepage support for vfio/type1 iommu and surrounding cleanups and fixes" * tag 'vfio-v3.11' of git://github.com/awilliam/linux-vfio: vfio/type1: Fix leak on error path vfio: Limit group opens vfio/type1: Fix missed frees and zero sized removes vfio: fix documentation vfio: Provide module option to disable vfio_iommu_type1 hugepage support vfio: hugepage support for vfio_iommu_type1 vfio: Convert type1 iommu to use rbtree	2013-07-10 14:50:08 -07:00
Linus Torvalds	65b97fb730	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "This is the powerpc changes for the 3.11 merge window. In addition to the usual bug fixes and small updates, the main highlights are: - Support for transparent huge pages by Aneesh Kumar for 64-bit server processors. This allows the use of 16M pages as transparent huge pages on kernels compiled with a 64K base page size. - Base VFIO support for KVM on power by Alexey Kardashevskiy - Wiring up of our nvram to the pstore infrastructure, including putting compressed oopses in there by Aruna Balakrishnaiah - Move, rework and improve our "EEH" (basically PCI error handling and recovery) infrastructure. It is no longer specific to pseries but is now usable by the new "powernv" platform as well (no hypervisor) by Gavin Shan. - I fixed some bugs in our math-emu instruction decoding and made it usable to emulate some optional FP instructions on processors with hard FP that lack them (such as fsqrt on Freescale embedded processors). - Support for Power8 "Event Based Branch" facility by Michael Ellerman. This facility allows what is basically "userspace interrupts" for performance monitor events. - A bunch of Transactional Memory vs. Signals bug fixes and HW breakpoint/watchpoint fixes by Michael Neuling. And more ... I appologize in advance if I've failed to highlight something that somebody deemed worth it." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits) pstore: Add hsize argument in write_buf call of pstore_ftrace_call powerpc/fsl: add MPIC timer wakeup support powerpc/mpic: create mpic subsystem object powerpc/mpic: add global timer support powerpc/mpic: add irq_set_wake support powerpc/85xx: enable coreint for all the 64bit boards powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig powerpc/mpic: Add get_version API both for internal and external use powerpc: Handle both new style and old style reserve maps powerpc/hw_brk: Fix off by one error when validating DAWR region end powerpc/pseries: Support compression of oops text via pstore powerpc/pseries: Re-organise the oops compression code pstore: Pass header size in the pstore write callback powerpc/powernv: Fix iommu initialization again powerpc/pseries: Inform the hypervisor we are using EBB regs powerpc/perf: Add power8 EBB support powerpc/perf: Core EBB support for 64-bit book3s powerpc/perf: Drop MMCRA from thread_struct powerpc/perf: Don't enable if we have zero events ...	2013-07-04 10:29:23 -07:00
Alex Williamson	8d38ef1948	vfio/type1: Fix leak on error path We also don't handle unpinning zero pages as an error on other exits so we can fix that inconsistency by rolling in the next conditional return. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-07-01 08:28:58 -06:00
Al Viro	a47df1518e	vfio: remap_pfn_range() sets all those flags... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:46:41 +04:00
Alex Williamson	6d6768c61b	vfio: Limit group opens vfio_group_fops_open attempts to limit concurrent sessions by disallowing opens once group->container is set. This really doesn't do what we want and allow for inconsistent behavior, for instance a group can be opened twice, then a container set giving the user two file descriptors to the group. But then it won't allow more to be opened. There's not much reason to have the group opened multiple times since most access is through devices or the container, so complete what the original code intended and only allow a single instance. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-06-25 16:06:54 -06:00
Alex Williamson	f5bfdbf252	vfio/type1: Fix missed frees and zero sized removes With hugepage support we can only properly aligned and sized ranges. We only guarantee that we can unmap the same ranges mapped and not arbitrary sub-ranges. This means we might not free anything or might free more than requested. The vfio unmap interface started storing the unmapped size to return to userspace to handle this. This patch fixes a few places where we don't properly handle those cases, moves a memory allocation to a place where failure is an option and checks our loops to make sure we don't get into an infinite loop trying to remove an overlap. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-06-25 16:01:44 -06:00
Alex Williamson	5c6c2b21ec	vfio: Provide module option to disable vfio_iommu_type1 hugepage support Add a module option to vfio_iommu_type1 to disable IOMMU hugepage support. This causes iommu_map to only be called with single page mappings, disabling the IOMMU driver's ability to use hugepages. This option can be enabled by loading vfio_iommu_type1 with disable_hugepages=1 or dynamically through sysfs. If enabled dynamically, only new mappings are restricted. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-06-21 09:38:11 -06:00
Alex Williamson	166fd7d94a	vfio: hugepage support for vfio_iommu_type1 We currently send all mappings to the iommu in PAGE_SIZE chunks, which prevents the iommu from enabling support for larger page sizes. We still need to pin pages, which means we step through them in PAGE_SIZE chunks, but we can batch up contiguous physical memory chunks to allow the iommu the opportunity to use larger pages. The approach here is a bit different that the one currently used for legacy KVM device assignment. Rather than looking at the vma page size and using that as the maximum size to pass to the iommu, we instead simply look at whether the next page is physically contiguous. This means we might ask the iommu to map a 4MB region, while legacy KVM might limit itself to a maximum of 2MB. Splitting our mapping path also allows us to be smarter about locked memory because we can more easily unwind if the user attempts to exceed the limit. Therefore, rather than assuming that a mapping will result in locked memory, we test each page as it is pinned to determine whether it locks RAM vs an mmap'd MMIO region. This should result in better locking granularity and less locked page fudge factors in userspace. The unmap path uses the same algorithm as legacy KVM. We don't want to track the pfn for each mapping ourselves, but we need the pfn in order to unpin pages. We therefore ask the iommu for the iova to physical address translation, ask it to unpin a page, and see how many pages were actually unpinned. iommus supporting large pages will often return something bigger than a page here, which we know will be physically contiguous and we can unpin a batch of pfns. iommus not supporting large mappings won't see an improvement in batching here as they only unmap a page at a time. With this change, we also make a clarification to the API for mapping and unmapping DMA. We can only guarantee unmaps at the same granularity as used for the original mapping. In other words, unmapping a subregion of a previous mapping is not guaranteed and may result in a larger or smaller unmapping than requested. The size field in the unmapping structure is updated to reflect this. Previously this was unmodified on mapping, always returning the the requested unmap size. This is now updated to return the actual unmap size on success, allowing userspace to appropriately track mappings. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-06-21 09:38:02 -06:00
Alex Williamson	cd9b22685e	vfio: Convert type1 iommu to use rbtree We need to keep track of all the DMA mappings of an iommu container so that it can be automatically unmapped when the user releases the file descriptor. We currently do this using a simple list, where we merge entries with contiguous iovas and virtual addresses. Using a tree for this is a bit more efficient and allows us to use common code instead of inventing our own. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-06-21 09:37:50 -06:00
Alexey Kardashevskiy	5b25199eff	powerpc/vfio: Enable on pSeries platform The enables VFIO on the pSeries platform, enabling user space programs to access PCI devices directly. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-20 16:55:15 +10:00
Alexey Kardashevskiy	5ffd229c02	powerpc/vfio: Implement IOMMU driver for VFIO VFIO implements platform independent stuff such as a PCI driver, BAR access (via read/write on a file descriptor or direct mapping when possible) and IRQ signaling. The platform dependent part includes IOMMU initialization and handling. This implements an IOMMU driver for VFIO which does mapping/unmapping pages for the guest IO and provides information about DMA window (required by a POWER guest). Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-20 16:55:14 +10:00
Alexey Kardashevskiy	9a6aa279d3	vfio: fix crash on rmmod devtmpfs_delete_node() calls devnode() callback with mode==NULL but vfio still tries to write there. The patch fixes this. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-06-05 08:54:16 -06:00
Linus Torvalds	0b2e3b6bb4	vfio updates for v3.10 Changes include extension to support PCI AER notification to userspace, byte granularity of PCI config space and access to unarchitected PCI config space, better protection around IOMMU driver accesses, default file mode fix, and a few misc cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQIcBAABAgAGBQJRgox9AAoJECObm247sIsizJwP/23KprR6BuRt168Obo+xC/lp kj1E4Hd4XHx6T/5XRexa4GqhmyLqgoqbS19uj/K6ebY9rouVKo0V6OYNFDQiFR/F yEGjYIBKV9/eBZMQI4RbZpZKGDXTeFrXyjsac1m51JrR3st8zsvZlNPE/TjzhSfz jMnsCC99ZwIFjmptw/yWwgInswy1n9H9iICS14Xn1x7v71iyOE32reTG6M9HsPHe Xm5F0K5iLQX8qoISHAomrTnFmCLxp5Y2qhh7nZWmC2gCbPBEnGdqx4prwzJayAaC DAN8osaYldoKQuwI4wFYMICHxxCEFk8xU58FeKCmeQJZgomefVAoRKf86gAEsSLR XHptBQOktWVKNFhzZWyOuAX3iua/hmWcOa05I6inycV1x2ZAGlNhvJnj1iKIphLk +neBFAD9wDcAJZdXkfudq0jJ9jJTSAWBv8d+hozNfkjTVhF+wRnhZ7V6dZfb9+W6 kPGbgwqKApLoOabQbxbZ5Ftr6S7344prB/HAMywGa+xZfoxoYPxBHCi0rSTvUTWy oWLtzrKRbjBDc0qF4eEK9RZlmVt2ZmCUnUB3kbWED4mrJAHu4LlFghnAloePrIZ3 kXlMARWbyw/E+1WIMO4Ito5+1s3zVsJJgzVsAQDJkTQw2aYDhns73/Y25Dj7x7QS BKebrXnh2xVCN6OIu+eX =F0Cc -----END PGP SIGNATURE----- Merge tag 'vfio-for-v3.10' of git://github.com/awilliam/linux-vfio Pull vfio updates from Alex Williamson: "Changes include extension to support PCI AER notification to userspace, byte granularity of PCI config space and access to unarchitected PCI config space, better protection around IOMMU driver accesses, default file mode fix, and a few misc cleanups." * tag 'vfio-for-v3.10' of git://github.com/awilliam/linux-vfio: vfio: Set container device mode vfio: Use down_reads to protect iommu disconnects vfio: Convert container->group_lock to rwsem PCI/VFIO: use pcie_flags_reg instead of access PCI-E Capabilities Register vfio-pci: Enable raw access to unassigned config space vfio-pci: Use byte granularity in config map vfio: make local function vfio_pci_intx_unmask_handler() static VFIO-AER: Vfio-pci driver changes for supporting AER VFIO: Wrapper for getting reference to vfio_device	2013-05-02 14:02:32 -07:00
Alex Williamson	664e9386bd	vfio: Set container device mode Minor 0 is the VFIO container device (/dev/vfio/vfio). On it's own the container does not provide a user with any privileged access. It only supports API version check and extension check ioctls. Only by attaching a VFIO group to the container does it gain any access. Set the mode of the container to allow access. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-04-30 15:42:28 -06:00
Linus Torvalds	96a3e8af5a	PCI changes for the v3.10 merge window: PCI device hotplug - Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe) - Make acpiphp builtin only, not modular (Jiang Liu) - Add acpiphp mutual exclusion (Jiang Liu) Power management - Skip "PME enabled/disabled" messages when not supported (Rafael Wysocki) - Fix fallback to PCI_D0 (Rafael Wysocki) Miscellaneous - Factor quirk_io_region (Yinghai Lu) - Cache MSI capability offsets & cleanup (Gavin Shan, Bjorn Helgaas) - Clean up EISA resource initialization and logging (Bjorn Helgaas) - Fix prototype warnings (Andy Shevchenko, Bjorn Helgaas) - MIPS: Initialize of_node before scanning bus (Gabor Juhos) - Fix pcibios_get_phb_of_node() declaration "weak" annotation (Gabor Juhos) - Add MSI INTX_DISABLE quirks for AR8161/AR8162/etc (Xiong Huang) - Fix aer_inject return values (Prarit Bhargava) - Remove PME/ACPI dependency (Andrew Murray) - Use shared PCI_BUS_NUM() and PCI_DEVID() (Shuah Khan) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRfhSWAAoJEFmIoMA60/r8GrYQAIHDsyZIuJSf6g+8Td1h+PIC YD3wQhbyrDqQDuKU4+9cz+JsbHmnozUGA4UmlwmOGBxEa/Uauspb6yX1P1+x9Ok1 WD7Ar3BlA5OuYI/1L1mgCiA428MTujwoR4fPnC0+KFy8xk1tBpmhzzeOFohbKyFF hMBO/Xt9tCzPATJ1LhjIH4xAykfDkbnPNHNcUKRoAkRo0CO0lS8gcTk0shXXSNng p9kQ6c4cYZvlRIJTwlawWV09nr7mDsBYa3JClqXYZufUWfEwvIuhisJxCJ57sWi9 t+Ev8dm7VM6Cr5dV+ORArlboBFrq4f/W5U9j9GPFrRplwf+WbNT6tNGSpSDq8XhU Q7JjNgPWVdWXe1vIsMwaO49zi45/bNehuCSFLZiyPZwedMk764tys+iYw+tMRtv1 tBR7lwESSXfagmvWyQAuQOTy6Rj26BPd2T8e2lMsvsuQO9mCyTK6Ey3YyKuqKQK/ l5Gns4vv4eaCjGXqqDGiydUjSes+r/v1bu43XiRnwPQJUKb5kr5SjN5/zSMBuUgm TLT/bnv8qvdFxCpVQJFv4k/uzULARMdbvLtTy8osB14vNHX9jPn+xORjLaZNiO6O 7fFispMU8Om56hNkD6C451r3icRjjGlD7OA8KOlbZ8f876sLzGV9i6P9gwCoRdEB wclDPsN7kAzw/V2sEE60 =bj8i -----END PGP SIGNATURE----- Merge tag 'pci-v3.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v3.10 merge window: PCI device hotplug - Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe) - Make acpiphp builtin only, not modular (Jiang Liu) - Add acpiphp mutual exclusion (Jiang Liu) Power management - Skip "PME enabled/disabled" messages when not supported (Rafael Wysocki) - Fix fallback to PCI_D0 (Rafael Wysocki) Miscellaneous - Factor quirk_io_region (Yinghai Lu) - Cache MSI capability offsets & cleanup (Gavin Shan, Bjorn Helgaas) - Clean up EISA resource initialization and logging (Bjorn Helgaas) - Fix prototype warnings (Andy Shevchenko, Bjorn Helgaas) - MIPS: Initialize of_node before scanning bus (Gabor Juhos) - Fix pcibios_get_phb_of_node() declaration "weak" annotation (Gabor Juhos) - Add MSI INTX_DISABLE quirks for AR8161/AR8162/etc (Xiong Huang) - Fix aer_inject return values (Prarit Bhargava) - Remove PME/ACPI dependency (Andrew Murray) - Use shared PCI_BUS_NUM() and PCI_DEVID() (Shuah Khan)" * tag 'pci-v3.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (63 commits) vfio-pci: Use cached MSI/MSI-X capabilities vfio-pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI: Remove "extern" from function declarations PCI: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI: Drop msi_mask_reg() and remove drivers/pci/msi.h PCI: Use msix_table_size() directly, drop multi_msix_capable() PCI: Drop msix_table_offset_reg() and msix_pba_offset_reg() macros PCI: Drop is_64bit_address() and is_mask_bit_support() macros PCI: Drop msi_data_reg() macro PCI: Drop msi_lower_address_reg() and msi_upper_address_reg() macros PCI: Drop msi_control_reg() macro and use PCI_MSI_FLAGS directly PCI: Use cached MSI/MSI-X offsets from dev, not from msi_desc PCI: Clean up MSI/MSI-X capability #defines PCI: Use cached MSI-X cap while enabling MSI-X PCI: Use cached MSI cap while enabling MSI interrupts PCI: Remove MSI/MSI-X cap check in pci_msi_check_device() PCI: Cache MSI/MSI-X capability offsets in struct pci_dev PCI: Use u8, not int, for PM capability offset [SCSI] megaraid_sas: Use correct #define for MSI-X capability PCI: Remove "extern" from function declarations ...	2013-04-29 09:30:25 -07:00
Alex Williamson	0b43c08233	vfio: Use down_reads to protect iommu disconnects If a group or device is released or a container is unset from a group it can race against file ops on the container. Protect these with down_reads to allow concurrent users. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reported-by: Michael S. Tsirkin <mst@redhat.com>	2013-04-29 08:41:36 -06:00
Alex Williamson	9587f44aa6	vfio: Convert container->group_lock to rwsem All current users are writers, maintaining current mutual exclusion. This lets us add read users next. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-04-25 16:12:38 -06:00
Bjorn Helgaas	a9047f24df	vfio-pci: Use cached MSI/MSI-X capabilities We now cache the MSI/MSI-X capability offsets in the struct pci_dev, so no need to find the capabilities again. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2013-04-24 11:35:56 -06:00
Bjorn Helgaas	508d1aa602	vfio-pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI_MSIX_FLAGS_BIRMASK is mis-named because the BIR mask is in the Table Offset register, not the flags ("Message Control" per spec) register. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2013-04-24 11:35:56 -06:00
Yijing Wang	aa2cba51a0	PCI/VFIO: use pcie_flags_reg instead of access PCI-E Capabilities Register Currently, we use pcie_flags_reg to cache PCI-E Capabilities Register, because PCI-E Capabilities Register bits are almost read-only. This patch use pcie_caps_reg() instead of another access PCI-E Capabilities Register. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-04-15 08:45:10 -06:00
Alex Williamson	a7d1ea1c11	vfio-pci: Enable raw access to unassigned config space Devices like be2net hide registers between the gaps in capabilities and architected regions of PCI config space. Our choices to support such devices is to either build an ever growing and unmanageable white list or rely on hardware isolation to protect us. These registers are really no different than MMIO or I/O port space registers, which we don't attempt to regulate, so treat PCI config space in the same way. Reported-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Gavin Shan <shangw@linux.vnet.ibm.com>	2013-04-01 09:04:12 -06:00
Alex Williamson	180b138107	vfio-pci: Use byte granularity in config map The config map previously used a byte per dword to map regions of config space to capabilities. Modulo a bug where we round the length of capabilities down instead of up, this theoretically works well and saves space so long as devices don't try to hide registers in the gaps between capabilities. Unfortunately they do exactly that so we need byte granularity on our config space map. Increase the allocation of the config map and split accesses at capability region boundaries. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Gavin Shan <shangw@linux.vnet.ibm.com>	2013-04-01 09:03:44 -06:00
Alex Williamson	904c680c7b	vfio-pci: Fix possible integer overflow The VFIO_DEVICE_SET_IRQS ioctl takes a start and count parameter, both of which are unsigned. We attempt to bounds check these, but fail to account for the case where start is a very large number, allowing start + count to wrap back into the valid range. Bounds check both start and start + count. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-03-26 11:33:16 -06:00
Wei Yongjun	0bced2f728	vfio: make local function vfio_pci_intx_unmask_handler() static vfio_pci_intx_unmask_handler() was not declared. It should be static. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-03-25 11:15:44 -06:00
Arnd Bergmann	25e9789ddd	vfio: include <linux/slab.h> for kmalloc The vfio drivers call kmalloc or kzalloc, but do not include <linux/slab.h>, which causes build errors on ARM. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: kvm@vger.kernel.org	2013-03-15 12:58:20 -06:00
Vijay Mohan Pandarathil	dad9f8972e	VFIO-AER: Vfio-pci driver changes for supporting AER - New VFIO_SET_IRQ ioctl option to pass the eventfd that is signaled when an error occurs in the vfio_pci_device - Register pci_error_handler for the vfio_pci driver - When the device encounters an error, the error handler registered by the vfio_pci driver gets invoked by the AER infrastructure - In the error handler, signal the eventfd registered for the device. - This results in the qemu eventfd handler getting invoked and appropriate action taken for the guest. Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-03-11 09:31:22 -06:00
Vijay Mohan Pandarathil	44f507163d	VFIO: Wrapper for getting reference to vfio_device - Added vfio_device_get_from_dev() as wrapper to get reference to vfio_device from struct device. - Added vfio_device_data() as a wrapper to get device_data from vfio_device. Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-03-11 09:28:44 -06:00
Tejun Heo	a1c36b166b	vfio: convert to idr_alloc() Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:19 -08:00
Kees Cook	d65530fbc7	drivers/vfio: remove depends on CONFIG_EXPERIMENTAL The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-02-24 09:59:44 -07:00
Alex Williamson	84237a826b	vfio-pci: Add support for VGA region access PCI defines display class VGA regions at I/O port address 0x3b0, 0x3c0 and MMIO address 0xa0000. As these are non-overlapping, we can ignore the I/O port vs MMIO difference and expose them both in a single region. We make use of the VGA arbiter around each access to configure chipset access as necessary. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-02-18 10:11:13 -07:00
Alex Williamson	2dd1194833	vfio-pci: Manage user power state transitions We give the user access to change the power state of the device but certain transitions result in an uninitialized state which the user cannot resolve. To fix this we need to mark the PowerState field of the PMCSR register read-only and effect the requested change on behalf of the user. This has the added benefit that pdev->current_state remains accurate while controlled by the user. The primary example of this bug is a QEMU guest doing a reboot where the device it put into D3 on shutdown and becomes unusable on the next boot because the device did a soft reset on D3->D0 (NoSoftRst-). Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-02-18 10:10:33 -07:00
Alex Williamson	2b489a45f6	vfio: whitelist pcieport pcieport does nice things like manage AER and we know it doesn't do DMA or expose any user accessible devices on the host. It also keeps the Memory, I/O, and Busmaster bits enabled, which is pretty handy when trying to use anyting below it. Devices owned by pcieport cannot be given to users via vfio, but we can tolerate them not being owned by vfio-pci. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-02-14 14:02:13 -07:00
Alex Williamson	e014e9444a	vfio: Protect vfio_dev_present against device_del vfio_dev_present is meant to give us a wait_event callback so that we can block removing a device from vfio until it becomes unused. The root of this check depends on being able to get the iommu group from the device. Unfortunately if the BUS_NOTIFY_DEL_DEVICE notifier has fired then the device-group reference is no longer searchable and we fail the lookup. We don't need to go to such extents for this though. We have a reference to the device, from which we can acquire a reference to the group. We can then use the group reference to search for the device and properly block removal. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-02-14 14:02:13 -07:00
Alex Williamson	906ee99dd2	vfio-pci: Cleanup BAR access We can actually handle MMIO and I/O port from the same access function since PCI already does abstraction of this. The ROM BAR only requires a minor difference, so it gets included too. vfio_pci_config_readwrite gets renamed for consistency. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-02-14 14:02:12 -07:00
Alex Williamson	5b279a11d3	vfio-pci: Cleanup read/write functions The read and write functions are nearly identical, combine them and convert to a switch statement. This also makes it easy to narrow the scope of when we use the io/mem accessors in case new regions are added. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-02-14 14:02:12 -07:00
Alex Williamson	5641ade41f	vfio-pci: Enable PCIe extended capabilities on v1 Even PCIe 1.x had extended config space. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-02-14 10:45:31 -07:00
Alex Williamson	ec1287e511	vfio-pci: Fix buffer overfill A read from a range hidden from the user (ex. MSI-X vector table) attempts to fill the user buffer up to the end of the excluded range instead of up to the requested count. Fix it. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: stable@vger.kernel.org	2013-01-15 10:45:26 -07:00
Alex Williamson	9a92c5091a	vfio-pci: Enable device before attempting reset Devices making use of PM reset are getting incorrectly identified as not supporting reset because pci_pm_reset() fails unless the device is in D0 power state. When first attached to vfio_pci devices are typically in an unknown power state. We can fix this by explicitly setting the power state or simply calling pci_enable_device() before attempting a pci_reset_function(). We need to enable the device anyway, so move this up in our vfio_pci_enable() function, which also simplifies the error path a bit. Note that pci_disable_device() does not explicitly set the power state, so there's no need to re-order vfio_pci_disable(). Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-12-07 13:43:51 -07:00
Jiang Liu	05bf3aac93	VFIO: fix out of order labels for error recovery in vfio_pci_init() The two labels for error recovery in function vfio_pci_init() is out of order, so fix it. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-12-07 13:43:51 -07:00
Jiang Liu	de2b3eeafb	VFIO: use ACCESS_ONCE() to guard access to dev->driver Comments from dev_driver_string(), /* dev->driver can change to NULL underneath us because of unbinding, * so be careful about accessing it. */ So use ACCESS_ONCE() to guard access to dev->driver field. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-12-07 13:43:50 -07:00
Jiang Liu	9df7b25ab7	VFIO: unregister IOMMU notifier on error recovery path On error recovery path in function vfio_create_group(), it should unregister the IOMMU notifier for the new VFIO group. Otherwise it may cause invalid memory access later when handling bus notifications. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-12-07 13:43:50 -07:00
Alex Williamson	2007722a60	vfio-pci: Re-order device reset Move the device reset to the end of our disable path, the device should already be stopped from pci_disable_device(). This also allows us to manipulate the save/restore to avoid the save/reset/restore + save/restore that we had before. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-12-07 13:43:50 -07:00
Fengguang Wu	3a1f7041dd	vfio: simplify kmalloc+copy_from_user to memdup_user Generated by: coccinelle/api/memdup_user.cocci Acked-by: Julia Lawall <julia.lawall@lip6.fr> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-12-07 13:43:49 -07:00
Alex Williamson	899649b7d4	vfio: Fix PCI INTx disable consistency The virq_disabled flag tracks the userspace view of INTx masking across interrupt mode changes, but we're not consistently applying this to the interrupt and masking handler notion of the device. Currently if the user sets DisINTx while in MSI or MSIX mode, then returns to INTx mode (ex. rebooting a qemu guest), the hardware has DisINTx+, but the management of INTx thinks it's enabled, making it impossible to actually clear DisINTx. Fix this by updating the handler state when INTx is re-enabled. Cc: stable@vger.kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-10-10 09:10:32 -06:00
Alex Williamson	9dbdfd23b7	vfio: Move PCI INTx eventfd setting earlier We need to be ready to recieve an interrupt as soon as we call request_irq, so our eventfd context setting needs to be moved earlier. Without this, an interrupt from our device or one sharing the interrupt line can pass a NULL into eventfd_signal and oops. Cc: stable@vger.kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-10-10 09:10:32 -06:00
Alex Williamson	34002f54d2	vfio: Fix PCI mmap after `b3b9c293` Our mmap path mistakely relied on vma->vm_pgoff to get set in remap_pfn_range. After `b3b9c293`, that path only applies to copy-on-write mappings. Set it in our own code. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-10-10 09:10:31 -06:00
Linus Torvalds	547b1e81af	Fix staging driver use of VM_RESERVED The VM_RESERVED flag was killed off in commit `314e51b985` ("mm: kill vma flag VM_RESERVED and mm->reserved_vm counter"), and replaced by the proper semantic flags (eg "don't core-dump" etc). But there was a new use of VM_RESERVED that got missed by the merge. Fix the remaining use of VM_RESERVED in the vfio_pci driver, replacing the VM_RESERVED flag with VM_DONTEXPAND \| VM_DONTDUMP. Signed-off-by: Linus Torvalds <torvalds@linux-foundation,org>	2012-10-09 21:06:41 +09:00
Al Viro	2903ff019b	switch simple cases of fget_light to fdget Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-26 22:20:08 -04:00
Al Viro	1d3653a79c	switch vfio_group_set_container() to fget_light() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-26 21:10:09 -04:00
Alex Williamson	b68e7fa879	vfio: Fix virqfd release race vfoi-pci supports a mechanism like KVM's irqfd for unmasking an interrupt through an eventfd. There are two ways to shutdown this interface: 1) close the eventfd, 2) ioctl (such as disabling the interrupt). Both of these do the release through a workqueue, which can result in a segfault if two jobs get queued for the same virqfd. Fix this by protecting the pointer to these virqfds by a spinlock. The vfio pci device will therefore no longer have a reference to it once the release job is queued under lock. On the ioctl side, we still flush the workqueue to ensure that any outstanding releases are completed. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-09-21 10:48:28 -06:00
Al Viro	31605debdf	vfio: grab vfio_device reference before exposing the sucker via fd_install() It's not critical (anymore) since another thread closing the file will block on ->device_lock before it gets to dropping the final reference, but it's definitely cleaner that way... Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-22 10:26:42 -04:00
Al Viro	90b1253e41	vfio: get rid of vfio_device_put()/vfio_group_get_device* races we really need to make sure that dropping the last reference happens under the group->device_lock; otherwise a loop (under device_lock) might find vfio_device instance that is being freed right now, has already dropped the last reference and waits on device_lock to exclude the sucker from the list. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-22 10:26:13 -04:00
Al Viro	6d2cd3ce81	vfio: get rid of open-coding kref_put_mutex Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-22 10:25:19 -04:00
Al Viro	934ad4c235	vfio: don't dereference after kfree... Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-22 10:23:04 -04:00
Alex Williamson	89e1f7d4c6	vfio: Add PCI device driver Add PCI device support for VFIO. PCI devices expose regions for accessing config space, I/O port space, and MMIO areas of the device. PCI config access is virtualized in the kernel, allowing us to ensure the integrity of the system, by preventing various accesses while reducing duplicate support across various userspace drivers. I/O port supports read/write access while MMIO also supports mmap of sufficiently sized regions. Support for INTx, MSI, and MSI-X interrupts are provided using eventfds to userspace. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-07-31 08:16:24 -06:00
Alex Williamson	73fa0d10d0	vfio: Type1 IOMMU implementation This VFIO IOMMU backend is designed primarily for AMD-Vi and Intel VT-d hardware, but is potentially usable by anything supporting similar mapping functionality. We arbitrarily call this a Type1 backend for lack of a better name. This backend has no IOVA or host memory mapping restrictions for the user and is optimized for relatively static mappings. Mapped areas are pinned into system memory. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-07-31 08:16:23 -06:00
Alex Williamson	cba3345cc4	vfio: VFIO core VFIO is a secure user level driver for use with both virtual machines and user level drivers. VFIO makes use of IOMMU groups to ensure the isolation of devices in use, allowing unprivileged user access. It's intended that VFIO will replace KVM device assignment and UIO drivers (in cases where the target platform includes a sufficiently capable IOMMU). New in this version of VFIO is support for IOMMU groups managed through the IOMMU core as well as a rework of the API, removing the group merge interface. We now go back to a model more similar to original VFIO with UIOMMU support where the file descriptor obtained from /dev/vfio/vfio allows access to the IOMMU, but only after a group is added, avoiding the previous privilege issues with this type of model. IOMMU support is also now fully modular as IOMMUs have vastly different interface requirements on different platforms. VFIO users are able to query and initialize the IOMMU model of their choice. Please see the follow-on Documentation commit for further description and usage example. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-07-31 08:16:22 -06:00

1 2 3 4 5

238 Commits