Commit Graph

304 Commits

Author SHA1 Message Date
Michael S. Tsirkin 2989be09a8 virtio_pci: fix use after free on release
KASan detected a use-after-free error in virtio-pci remove code. In
virtio_pci_remove(), vp_dev is still used after being freed in
unregister_virtio_device() (in virtio_pci_release_dev() more
precisely).

To fix, keep a reference until cleanup is done.

Fixes: 63bd62a08c ("virtio_pci: defer kfree until release callback")
Reported-by: Jerome Marchand <jmarchan@redhat.com>
Cc: stable@vger.kernel.org
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Jerome Marchand <jmarchan@redhat.com>
2016-01-26 10:18:28 +02:00
Stefan Hajnoczi f7ad26ff95 virtio: make find_vqs() checkpatch.pl-friendly
checkpatch.pl wants arrays of strings declared as follows:

  static const char * const names[] = { "vq-1", "vq-2", "vq-3" };

Currently the find_vqs() function takes a const char *names[] argument
so passing checkpatch.pl's const char * const names[] results in a
compiler error due to losing the second const.

This patch adjusts the find_vqs() prototype and updates all virtio
transports.  This makes it possible for virtio_balloon.c, virtio_input.c,
virtgpu_kms.c, and virtio_rpmsg_bus.c to use the checkpatch.pl-friendly
type.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
2016-01-12 20:47:06 +02:00
Minchan Kim f68b992bbb virtio_balloon: fix race by fill and leak
During my compaction-related stuff, I encountered a bug
with ballooning.

With repeated inflating and deflating cycle, guest memory(
ie, cat /proc/meminfo | grep MemTotal) is decreased and
couldn't be recovered.

The reason is balloon_lock doesn't cover release_pages_balloon
so struct virtio_balloon fields could be overwritten by race
of fill_balloon(e,g, vb->*pfns could be critical).

This patch fixes it in my test.

Cc: <stable@vger.kernel.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-01-12 20:47:05 +02:00
Michael S. Tsirkin 788e5b3a5d virtio_ring: use virt_store_mb
We need a full barrier after writing out event index, using
virt_store_mb there seems better than open-coding.  As usual, we need a
wrapper to account for strong barriers.

It's tempting to use this in vhost as well, for that, we'll
need a variant of smp_store_mb that works on __user pointers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-01-12 20:47:02 +02:00
Venkatesh Srinivas f277ec42f3 virtio_ring: shadow available ring flags & index
Improves cacheline transfer flow of available ring header.

Virtqueues are implemented as a pair of rings, one producer->consumer
avail ring and one consumer->producer used ring; preceding the
avail ring in memory are two contiguous u16 fields -- avail->flags
and avail->idx. A producer posts work by writing to avail->idx and
a consumer reads avail->idx.

The flags and idx fields only need to be written by a producer CPU
and only read by a consumer CPU; when the producer and consumer are
running on different CPUs and the virtio_ring code is structured to
only have source writes/sink reads, we can continuously transfer the
avail header cacheline between 'M' states between cores. This flow
optimizes core -> core bandwidth on certain CPUs.

(see: "Software Optimization Guide for AMD Family 15h Processors",
Section 11.6; similar language appears in the 10h guide and should
apply to CPUs w/ exclusive caches, using LLC as a transfer cache)

Unfortunately the existing virtio_ring code issued reads to the
avail->idx and read-modify-writes to avail->flags on the producer.

This change shadows the flags and index fields in producer memory;
the vring code now reads from the shadows and only ever writes to
avail->flags and avail->idx, allowing the cacheline to transfer
core -> core optimally.

In a concurrent version of vring_bench, the time required for
10,000,000 buffer checkout/returns was reduced by ~2% (average
across many runs) on an AMD Piledriver (15h) CPU:

(w/o shadowing):
 Performance counter stats for './vring_bench':
     5,451,082,016      L1-dcache-loads
     ...
       2.221477739 seconds time elapsed

(w/ shadowing):
 Performance counter stats for './vring_bench':
     5,405,701,361      L1-dcache-loads
     ...
       2.168405376 seconds time elapsed

The further away (in a NUMA sense) virtio producers and consumers are
from each other, the more we expect to benefit. Physical implementations
of virtio devices and implementations of virtio where the consumer polls
vring avail indexes (vhost) should also benefit.

Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-07 17:28:11 +02:00
Michal Hocko 82107539bb virtio: Do not drop __GFP_HIGH in alloc_indirect
b92b1b89a3 ("virtio: force vring descriptors to be allocated from
lowmem") tried to exclude highmem pages for descriptors so it cleared
__GFP_HIGHMEM from a given gfp mask. The patch also cleared __GFP_HIGH
which doesn't make much sense for this fix because __GFP_HIGH only
controls access to memory reserves and it doesn't have any influence
on the zone selection. Some of the call paths use GFP_ATOMIC and
dropping __GFP_HIGH will reduce their changes for success because the
lack of access to memory reserves.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Mel Gorman <mgorman@techsingularity.net>
2015-12-07 17:28:11 +02:00
Suman Anna c13f99b7e9 virtio: fix memory leak of virtio ida cache layers
The virtio core uses a static ida named virtio_index_ida for
assigning index numbers to virtio devices during registration.
The ida core may allocate some internal idr cache layers and
an ida bitmap upon any ida allocation, and all these layers are
truely freed only upon the ida destruction. The virtio_index_ida
is not destroyed at present, leading to a memory leak when using
the virtio core as a module and atleast one virtio device is
registered and unregistered.

Fix this by invoking ida_destroy() in the virtio core module
exit.

Cc: stable@vger.kernel.org
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-07 17:28:01 +02:00
Denis V. Lunev 997e120843 virtio_balloon: do not change memory amount visible via /proc/meminfo
Balloon device is frequently used as a mean of cooperative memory control
in between guest and host to manage memory overcommitment. This is the
typical case for any hosting workload when KVM guest is provided for
end-user.

Though there is a problem in this setup. The end-user and hosting provider
have signed SLA agreement in which some amount of memory is guaranted for
the guest. The good thing is that this memory will be given to the guest
when the guest will really need it (f.e. with OOM in guest and with
VIRTIO_BALLOON_F_DEFLATE_ON_OOM configuration flag set). The bad thing
is that end-user does not know this.

Balloon by default reduce the amount of memory exposed to the end-user
each time when the page is stolen from guest or returned back by using
adjust_managed_page_count and thus /proc/meminfo shows reduced amount
of memory.

Fortunately the solution is simple, we should just avoid to call
adjust_managed_page_count with VIRTIO_BALLOON_F_DEFLATE_ON_OOM set.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-09-08 13:32:11 +03:00
Denis V. Lunev b4d3403732 virtio_ballon: change stub of release_pages_by_pfn
and rename it to release_pages_balloon. The function originally takes
arrays of pfns and now it takes pointer to struct virtio_ballon.
This change is necessary to conditionally call adjust_managed_page_count
in the next patch.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-09-08 13:31:51 +03:00
Graeme Gregory 38c4ab8e48 virtio_mmio: add ACPI probing
Added the match table and pointers for ACPI probing to the driver.

This uses the same identifier for virt devices as being used for qemu
ARM64 ACPI support.

d0bf1955a3

Signed-off-by: Graeme Gregory <graeme.gregory@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-09-08 13:30:28 +03:00
Jason Wang df4198b1e0 virtio-input: reset device and detach unused during remove
Spec requires a device reset during cleanup, so do it and avoid warn
in virtio core. And detach unused buffers to avoid memory leak.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-08-06 10:40:35 +03:00
Linus Torvalds 5fc835284d virtio/vhost: cross endian support
I have just queued some more bugfix patches today but none fix regressions and
 none are related to these ones, so it looks like a good time for a merge for
 -rc1.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVk7JOAAoJECgfDbjSjVRpHgEIAKrgLd7gIQ8lO+LCYqne6WLQ
 Ky8rOUnaxX4gD5N0akhfJFr/m/yIyAfk9+ALZZUo3kfuFiEsT2rn32iK/2Gj8pcu
 HFoAWhS+7b/ZsfpHRPtv/zVD3q4c3nWsWpfWK09J+4t0UJuC8fmGMoBzkS0kjZtd
 dQnHlJi5+1u4ch2x9sYYeVx7GOJ8a1W0q7cWJnWdOffWLEP9/zB8fgRVLFp/7AAd
 uBlza93RU81wS7q5tSUph6ESPqt2yu357e//4jnWjVx5EUXDRBL3A/T1JpC1qYSn
 WV2Gv14x+LVz2G8WgGmwfMq1H9Dvd/OzNToX5R8SIRx6Rh5L6gxFQjqt4dclGj8=
 =nKap
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio/vhost cross endian support from Michael Tsirkin:
 "I have just queued some more bugfix patches today but none fix
  regressions and none are related to these ones, so it looks like a
  good time for a merge for -rc1.

  The motivation for this is support for legacy BE guests on the new LE
  hosts.  There are two redeeming properties that made me merge this:

   - It's a trivial amount of code: since we wrap host/guest accesses
     anyway, almost all of it is well hidden from drivers.

   - Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY,
     and when it's clear, there's zero overhead (as some point it was
     tested by compiling with and without the patches, got the same
     stripped binary).

  Maybe we could create a Kconfig symbol to enforce the second point:
  prevent people from enabling it eg on x86.  I will look into this"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio-pci: alloc only resources actually used.
  macvtap/tun: cross-endian support for little-endian hosts
  vhost: cross-endian support for legacy devices
  virtio: add explicit big-endian support to memory accessors
  vhost: introduce vhost_is_little_endian() helper
  vringh: introduce vringh_is_little_endian() helper
  macvtap: introduce macvtap_is_little_endian() helper
  tun: add tun_is_little_endian() helper
  virtio: introduce virtio_is_little_endian() helper
2015-07-03 16:02:25 -07:00
Linus Torvalds 02201e3f1b Minor merge needed, due to function move.
Main excitement here is Peter Zijlstra's lockless rbtree optimization to
 speed module address lookup.  He found some abusers of the module lock
 doing that too.
 
 A little bit of parameter work here too; including Dan Streetman's breaking
 up the big param mutex so writing a parameter can load another module (yeah,
 really).  Unfortunately that broke the usual suspects, !CONFIG_MODULES and
 !CONFIG_SYSFS, so those fixes were appended too.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVkgKHAAoJENkgDmzRrbjxQpwQAJVmBN6jF3SnwbQXv9vRixjH
 58V33sb1G1RW+kXxQ3/e8jLX/4VaN479CufruXQp+IJWXsN/CH0lbC3k8m7u50d7
 b1Zeqd/Yrh79rkc11b0X1698uGCSMlzz+V54Z0QOTEEX+nSu2ZZvccFS4UaHkn3z
 rqDo00lb7rxQz8U25qro2OZrG6D3ub2q20TkWUB8EO4AOHkPn8KWP2r429Axrr0K
 wlDWDTTt8/IsvPbuPf3T15RAhq1avkMXWn9nDXDjyWbpLfTn8NFnWmtesgY7Jl4t
 GjbXC5WYekX3w2ZDB9KaT/DAMQ1a7RbMXNSz4RX4VbzDl+yYeSLmIh2G9fZb1PbB
 PsIxrOgy4BquOWsJPm+zeFPSC3q9Cfu219L4AmxSjiZxC3dlosg5rIB892Mjoyv4
 qxmg6oiqtc4Jxv+Gl9lRFVOqyHZrTC5IJ+xgfv1EyP6kKMUKLlDZtxZAuQxpUyxR
 HZLq220RYnYSvkWauikq4M8fqFM8bdt6hLJnv7bVqllseROk9stCvjSiE3A9szH5
 OgtOfYV5GhOeb8pCZqJKlGDw+RoJ21jtNCgOr6DgkNKV9CX/kL/Puwv8gnA0B0eh
 dxCeB7f/gcLl7Cg3Z3gVVcGlgak6JWrLf5ITAJhBZ8Lv+AtL2DKmwEWS/iIMRmek
 tLdh/a9GiCitqS0bT7GE
 =tWPQ
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Main excitement here is Peter Zijlstra's lockless rbtree optimization
  to speed module address lookup.  He found some abusers of the module
  lock doing that too.

  A little bit of parameter work here too; including Dan Streetman's
  breaking up the big param mutex so writing a parameter can load
  another module (yeah, really).  Unfortunately that broke the usual
  suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
  appended too"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
  modules: only use mod->param_lock if CONFIG_MODULES
  param: fix module param locks when !CONFIG_SYSFS.
  rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
  module: add per-module param_lock
  module: make perm const
  params: suppress unused variable error, warn once just in case code changes.
  modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
  kernel/module.c: avoid ifdefs for sig_enforce declaration
  kernel/workqueue.c: remove ifdefs over wq_power_efficient
  kernel/params.c: export param_ops_bool_enable_only
  kernel/params.c: generalize bool_enable_only
  kernel/module.c: use generic module param operaters for sig_enforce
  kernel/params: constify struct kernel_param_ops uses
  sysfs: tightened sysfs permission checks
  module: Rework module_addr_{min,max}
  module: Use __module_address() for module_address_lookup()
  module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
  module: Optimize __module_address() using a latched RB-tree
  rbtree: Implement generic latch_tree
  seqlock: Introduce raw_read_seqcount_latch()
  ...
2015-07-01 10:49:25 -07:00
Gerd Hoffmann 59a5b0f7bf virtio-pci: alloc only resources actually used.
Move resource allocation from common code to legacy and modern code.
Only request resources actually used, i.e. bar0 in legacy mode and
the bar(s) specified by capabilities in modern mode.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-06-24 08:15:09 +02:00
Linus Torvalds d8133356e9 PCI changes for the v4.2 merge window:
Enumeration
     - Move pci_ari_enabled() to global header (Alex Williamson)
     - Account for ARI in _PRT lookups (Alex Williamson)
     - Remove unused pci_scan_bus_parented() (Yijing Wang)
 
   Resource management
     - Use host bridge _CRS info on systems with >32 bit addressing (Bjorn Helgaas)
     - Use host bridge _CRS info on Foxconn K8M890-8237A (Bjorn Helgaas)
     - Fix pci_address_to_pio() conversion of CPU address to I/O port (Zhichang Yuan)
     - Add pci_bus_addr_t (Yinghai Lu)
 
   PCI device hotplug
     - Wait for pciehp command completion where necessary (Alex Williamson)
     - Drop pointless ACPI-based "slot detection" check (Rafael J. Wysocki)
     - Check ignore_hotplug for all downstream devices (Rafael J. Wysocki)
     - Propagate the "ignore hotplug" setting to parent (Rafael J. Wysocki)
     - Inline pciehp "handle event" functions into the ISR (Bjorn Helgaas)
     - Clean up pciehp debug logging (Bjorn Helgaas)
 
   Power management
     - Remove redundant PCIe port type checking (Yijing Wang)
     - Add dev->has_secondary_link to track downstream PCIe links (Yijing Wang)
     - Use dev->has_secondary_link to find downstream links for ASPM (Yijing Wang)
     - Drop __pci_disable_link_state() useless "force" parameter (Bjorn Helgaas)
     - Simplify Clock Power Management setting (Bjorn Helgaas)
 
   Virtualization
     - Add ACS quirks for Intel 9-series PCH root ports (Alex Williamson)
     - Add function 1 DMA alias quirk for Marvell 9120 (Sakari Ailus)
 
   MSI
     - Disable MSI at enumeration even if kernel doesn't support MSI (Michael S. Tsirkin)
     - Remove unused pci_msi_off() (Bjorn Helgaas)
     - Rename msi_set_enable(), msix_clear_and_set_ctrl() (Michael S.  Tsirkin)
     - Export pci_msi_set_enable(), pci_msix_clear_and_set_ctrl() (Michael S. Tsirkin)
     - Drop pci_msi_off() calls during probe (Michael S. Tsirkin)
 
   APM X-Gene host bridge driver
     - Add APM X-Gene v1 PCIe MSI/MSIX termination driver (Duc Dang)
     - Add APM X-Gene PCIe MSI DTS nodes (Duc Dang)
     - Disable Configuration Request Retry Status for v1 silicon (Duc Dang)
     - Allow config access to Root Port even when link is down (Duc Dang)
 
   Broadcom iProc host bridge driver
     - Allow override of device tree IRQ mapping function (Hauke Mehrtens)
     - Add BCMA PCIe driver (Hauke Mehrtens)
     - Directly add PCI resources (Hauke Mehrtens)
     - Free resource list after registration (Hauke Mehrtens)
 
   Freescale i.MX6 host bridge driver
     - Add speed change timeout message (Troy Kisky)
     - Rename imx6_pcie_start_link() to imx6_pcie_establish_link() (Bjorn Helgaas)
 
   Freescale Layerscape host bridge driver
     - Use dw_pcie_link_up() consistently (Bjorn Helgaas)
     - Factor out ls_pcie_establish_link() (Bjorn Helgaas)
 
   Marvell MVEBU host bridge driver
     - Remove mvebu_pcie_scan_bus() (Yijing Wang)
 
   NVIDIA Tegra host bridge driver
     - Remove tegra_pcie_scan_bus() (Yijing Wang)
 
   Synopsys DesignWare host bridge driver
     - Consolidate outbound iATU programming functions (Jisheng Zhang)
     - Use iATU0 for cfg and IO, iATU1 for MEM (Jisheng Zhang)
     - Add support for x8 links (Zhou Wang)
     - Wait for link to come up with consistent style (Bjorn Helgaas)
     - Use pci_scan_root_bus() for simplicity (Yijing Wang)
 
   TI DRA7xx host bridge driver
     - Use dw_pcie_link_up() consistently (Bjorn Helgaas)
 
   Miscellaneous
     - Include <linux/pci.h>, not <asm/pci.h> (Bjorn Helgaas)
     - Remove unnecessary #includes of <asm/pci.h> (Bjorn Helgaas)
     - Remove unused pcibios_select_root() (again) (Bjorn Helgaas)
     - Remove unused pci_dma_burst_advice() (Bjorn Helgaas)
     - xen/pcifront: Don't use deprecated function pci_scan_bus_parented() (Arnd Bergmann)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViCSWAAoJEFmIoMA60/r8zX8P/1DPNnk+8xSQe3dYjnG8VW3P
 GPxeCqLMkjiF3ffxcLDzsgrHMjZEb8Co67WePs0k5V0lbZevoIwUo48+oO9B5jhc
 H5DuPZHyTHeOvaZv4GUY5vq/1DBh4JXmJc2V/BkaJ6qhXckF+SCam9C+s0p4950o
 QX/ifOjg/VHzmhaiL7wymJOzuniZmIttl+y+nzkl3AUJ+T6ZtQbUhz+8GZ3lj7Ma
 F+7JHhvm9K8Ljajxb6BLWTw4xgHA6ZN5PtYEx+Sl9QBYSsGfL7LnqyYD3KhJ7KV5
 4AHNJGEVhzNwSuyh+VQx1tNm7OHOqkAaTsYdCVUZRow+6CPd8P75QOMtpl+SmPJB
 RV1BAO75OTGqKg0B9IDg855y4Nh+4/dKoZlBPzpp7+cKw3ylaRAsNnaZ9ik5D62v
 RR06CFgWGHwDXSObgbRm4v0HwfAIHWWJzrPqAZmElh2dzb1Lv1I3AbB1SClCN6sl
 fnAu6CAwA47A5GT8xW3L0oQXdcSmdNUdNzZrsfDnOBIQWMsF+zBFKr6sTABVgyxp
 /WEJaNlvx8Zlq0bZlhGDdsGSbFNFzhX4avWZtXhvdcqFzH0KaVghYSayYvJE9Haq
 oakWqS+GZ3x40j+rdrgLg98AWRVraE1MvV1A7N9TIGjuuKqqbZfSP8kvX3QRQQhO
 Z2+X5hMM0s/tdYtADYu/
 =Qw+j
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "PCI changes for the v4.2 merge window:

  Enumeration
    - Move pci_ari_enabled() to global header (Alex Williamson)
    - Account for ARI in _PRT lookups (Alex Williamson)
    - Remove unused pci_scan_bus_parented() (Yijing Wang)

  Resource management
    - Use host bridge _CRS info on systems with >32 bit addressing (Bjorn Helgaas)
    - Use host bridge _CRS info on Foxconn K8M890-8237A (Bjorn Helgaas)
    - Fix pci_address_to_pio() conversion of CPU address to I/O port (Zhichang Yuan)
    - Add pci_bus_addr_t (Yinghai Lu)

  PCI device hotplug
    - Wait for pciehp command completion where necessary (Alex Williamson)
    - Drop pointless ACPI-based "slot detection" check (Rafael J. Wysocki)
    - Check ignore_hotplug for all downstream devices (Rafael J. Wysocki)
    - Propagate the "ignore hotplug" setting to parent (Rafael J. Wysocki)
    - Inline pciehp "handle event" functions into the ISR (Bjorn Helgaas)
    - Clean up pciehp debug logging (Bjorn Helgaas)

  Power management
    - Remove redundant PCIe port type checking (Yijing Wang)
    - Add dev->has_secondary_link to track downstream PCIe links (Yijing Wang)
    - Use dev->has_secondary_link to find downstream links for ASPM (Yijing Wang)
    - Drop __pci_disable_link_state() useless "force" parameter (Bjorn Helgaas)
    - Simplify Clock Power Management setting (Bjorn Helgaas)

  Virtualization
    - Add ACS quirks for Intel 9-series PCH root ports (Alex Williamson)
    - Add function 1 DMA alias quirk for Marvell 9120 (Sakari Ailus)

  MSI
    - Disable MSI at enumeration even if kernel doesn't support MSI (Michael S. Tsirkin)
    - Remove unused pci_msi_off() (Bjorn Helgaas)
    - Rename msi_set_enable(), msix_clear_and_set_ctrl() (Michael S.  Tsirkin)
    - Export pci_msi_set_enable(), pci_msix_clear_and_set_ctrl() (Michael S. Tsirkin)
    - Drop pci_msi_off() calls during probe (Michael S. Tsirkin)

  APM X-Gene host bridge driver
    - Add APM X-Gene v1 PCIe MSI/MSIX termination driver (Duc Dang)
    - Add APM X-Gene PCIe MSI DTS nodes (Duc Dang)
    - Disable Configuration Request Retry Status for v1 silicon (Duc Dang)
    - Allow config access to Root Port even when link is down (Duc Dang)

  Broadcom iProc host bridge driver
    - Allow override of device tree IRQ mapping function (Hauke Mehrtens)
    - Add BCMA PCIe driver (Hauke Mehrtens)
    - Directly add PCI resources (Hauke Mehrtens)
    - Free resource list after registration (Hauke Mehrtens)

  Freescale i.MX6 host bridge driver
    - Add speed change timeout message (Troy Kisky)
    - Rename imx6_pcie_start_link() to imx6_pcie_establish_link() (Bjorn Helgaas)

  Freescale Layerscape host bridge driver
    - Use dw_pcie_link_up() consistently (Bjorn Helgaas)
    - Factor out ls_pcie_establish_link() (Bjorn Helgaas)

  Marvell MVEBU host bridge driver
    - Remove mvebu_pcie_scan_bus() (Yijing Wang)

  NVIDIA Tegra host bridge driver
    - Remove tegra_pcie_scan_bus() (Yijing Wang)

  Synopsys DesignWare host bridge driver
    - Consolidate outbound iATU programming functions (Jisheng Zhang)
    - Use iATU0 for cfg and IO, iATU1 for MEM (Jisheng Zhang)
    - Add support for x8 links (Zhou Wang)
    - Wait for link to come up with consistent style (Bjorn Helgaas)
    - Use pci_scan_root_bus() for simplicity (Yijing Wang)

  TI DRA7xx host bridge driver
    - Use dw_pcie_link_up() consistently (Bjorn Helgaas)

  Miscellaneous
    - Include <linux/pci.h>, not <asm/pci.h> (Bjorn Helgaas)
    - Remove unnecessary #includes of <asm/pci.h> (Bjorn Helgaas)
    - Remove unused pcibios_select_root() (again) (Bjorn Helgaas)
    - Remove unused pci_dma_burst_advice() (Bjorn Helgaas)
    - xen/pcifront: Don't use deprecated function pci_scan_bus_parented() (Arnd Bergmann)"

* tag 'pci-v4.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (58 commits)
  PCI: pciehp: Inline the "handle event" functions into the ISR
  PCI: pciehp: Rename queue_interrupt_event() to pciehp_queue_interrupt_event()
  PCI: pciehp: Make queue_interrupt_event() void
  PCI: xgene: Allow config access to Root Port even when link is down
  PCI: xgene: Disable Configuration Request Retry Status for v1 silicon
  PCI: pciehp: Clean up debug logging
  x86/PCI: Use host bridge _CRS info on systems with >32 bit addressing
  PCI: imx6: Add #define PCIE_RC_LCSR
  PCI: imx6: Use "u32", not "uint32_t"
  PCI: Remove unused pci_scan_bus_parented()
  xen/pcifront: Don't use deprecated function pci_scan_bus_parented()
  PCI: imx6: Add speed change timeout message
  PCI/ASPM: Simplify Clock Power Management setting
  PCI: designware: Wait for link to come up with consistent style
  PCI: layerscape: Factor out ls_pcie_establish_link()
  PCI: layerscape: Use dw_pcie_link_up() consistently
  PCI: dra7xx: Use dw_pcie_link_up() consistently
  x86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237A
  PCI: pciehp: Wait for hotplug command completion where necessary
  PCI: Remove unused pci_dma_burst_advice()
  ...
2015-06-23 13:41:24 -07:00
Jiang Liu 210d150e1f virtio_pci: Clear stale cpumask when setting irq affinity
The cpumask vp_dev->msix_affinity_masks[info->msix_vector] may contain
staled information when vp_set_vq_affinity() gets called, so clear it
before setting the new cpu bit mask.

Cc: stable@vger.kernel.org
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-06-04 14:47:49 +02:00
Luis R. Rodriguez 9c27847dda kernel/params: constify struct kernel_param_ops uses
Most code already uses consts for the struct kernel_param_ops,
sweep the kernel for the last offending stragglers. Other than
include/linux/moduleparam.h and kernel/params.c all other changes
were generated with the following Coccinelle SmPL patch. Merge
conflicts between trees can be handled with Coccinelle.

In the future git could get Coccinelle merge support to deal with
patch --> fail --> grammar --> Coccinelle --> new patch conflicts
automatically for us on patches where the grammar is available and
the patch is of high confidence. Consider this a feature request.

Test compiled on x86_64 against:

	* allnoconfig
	* allmodconfig
	* allyesconfig

@ const_found @
identifier ops;
@@

const struct kernel_param_ops ops = {
};

@ const_not_found depends on !const_found @
identifier ops;
@@

-struct kernel_param_ops ops = {
+const struct kernel_param_ops ops = {
};

Generated-by: Coccinelle SmPL
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Junio C Hamano <gitster@pobox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: cocci@systeme.lip6.fr
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:10 +09:30
Michael S. Tsirkin c1f4d88abd virtio_pci: drop pci_msi_off() call during probe
The PCI core now disables MSI and MSI-X for all devices during enumeration
regardless of CONFIG_PCI_MSI.  Remove device-specific code to disable
MSI/MSI-X.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2015-05-07 09:52:21 -05:00
Michael S. Tsirkin 9abbfb486f virtio: drop virtio_device_is_legacy_only
virtio_device_is_legacy_only is now unused, drop
it from core.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-04-15 12:41:14 +09:30
Michael S. Tsirkin f71d8286c1 virtio_pci: support non-legacy balloon devices
virtio_device_is_legacy_only is always false now,
drop the test from virtio pci.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-04-15 12:41:13 +09:30
Michael S. Tsirkin a62d547c72 virtio_mmio: support non-legacy balloon devices
virtio_device_is_legacy_only is always false now,
drop the test from virtio mmio.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-04-15 12:41:12 +09:30
Michael S. Tsirkin 2343dabc60 virtio: balloon might not be a legacy device
We added transitional device support to balloon driver,
so we don't need to black-list it in core anymore.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-04-15 12:41:11 +09:30
Michael S. Tsirkin df81b29c7b virtio_balloon: transitional interface
Virtio 1.0 doesn't include a modern balloon device.
But it's not a big change to support a transitional
balloon device: this has the advantage of supporting
existing drivers, transparently.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-04-15 12:41:09 +09:30
Michael S. Tsirkin a8557d32fe virtio_pci_modern: switch to type-safe io accessors
As Rusty noted, we were accessing queue_enable with an incorrect width.
Switch to type-safe accessors so we don't make this mistake again in the
future.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-04-01 14:43:34 +10:30
Michael S. Tsirkin c5d4c2c9ce virtio_pci_modern: type-safe io accessors
The spec is very clear on this:

4.1.3.1 Driver Requirements: PCI Device Layout

The driver MUST access each field using the “natural” access method,
i.e. 32-bit accesses for 32-bit fields, 16-bit accesses for 16-bit
fields and 8-bit accesses for 8-bit fields.

Add type-safe wrappers to prevent access with incorrect width.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-04-01 14:37:15 +10:30
Gerd Hoffmann 271c865161 Add virtio-input driver.
virtio-input is basically evdev-events-over-virtio, so this driver isn't
much more than reading configuration from config space and forwarding
incoming events to the linux input layer.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-03-29 12:13:52 +10:30
Michael S. Tsirkin 704a0b5f23 virtio_mmio: fix access width for mmio
Going over the virtio mmio code, I noticed that it doesn't correctly
access modern device config values using "natural" accessors: it uses
readb to get/set them byte by byte, while the virtio 1.0 spec explicitly states:

	4.2.2.2 Driver Requirements: MMIO Device Register Layout

	...

	The driver MUST only use 32 bit wide and aligned reads and writes to
	access the control registers described in table 4.1.
	For the device-specific configuration space, the driver MUST use
	8 bit wide accesses for 8 bit wide fields, 16 bit wide and aligned
	accesses for 16 bit wide fields and 32 bit wide and aligned accesses for
	32 and 64 bit wide fields.

Borrow code from virtio_pci_modern to do this correctly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-03-17 12:12:21 +10:30
Michael S. Tsirkin 87e7bf1450 virtio_mmio: generation support
virtio_mmio currently lacks generation support which
makes multi-byte field access racy.
Fix by getting the value at offset 0xfc for version 2
devices. Nothing we can do for version 1, so return
generation id 0.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-03-13 15:55:43 +10:30
Michael S. Tsirkin 3d2a3774c1 virtio-balloon: do not call blocking ops when !TASK_RUNNING
virtio balloon has this code:
        wait_event_interruptible(vb->config_change,
                                 (diff = towards_target(vb)) != 0
                                 || vb->need_stats_update
                                 || kthread_should_stop()
                                 || freezing(current));

Which is a problem because towards_target() call might block after
wait_event_interruptible sets task state to TAST_INTERRUPTIBLE, causing
the task_struct::state collision typical of nesting of sleeping
primitives

See also http://lwn.net/Articles/628628/ or Thomas's
bug report
http://article.gmane.org/gmane.linux.kernel.virtualization/24846
for a fuller explanation.

To fix, rewrite using wait_woken.

Cc: stable@vger.kernel.org
Reported-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-03-10 11:56:15 +10:30
Michael S. Tsirkin 88660f7fb9 virtio_balloon: set DRIVER_OK before using device
virtio spec requires that all drivers set DRIVER_OK
before using devices. While balloon isn't yet
included in the virtio 1 spec, previous spec versions
also required this.

virtio balloon might violate this rule: probe calls
kthread_run before setting DRIVER_OK, which might run
immediately and cause balloon to inflate/deflate.

To fix, call virtio_device_ready before running the kthread.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
2015-03-10 11:48:28 +10:30
Rusty Russell 5b40a7daf5 virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice.
I noticed this with the console device.  It's not *wrong*, just a bit
weird.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-02-17 16:19:29 +10:30
Rusty Russell 7abb568dbb virtio_pci: use 16-bit accessor for queue_enable.
Since PCI is little endian, 8-bit access might work, but the spec section
is very clear on this:

  4.1.3.1 Driver Requirements: PCI Device Layout

  The driver MUST access each field using the “natural” access method,
  i.e. 32-bit accesses for 32-bit fields, 16-bit accesses for 16-bit
  fields and 8-bit accesses for 8-bit fields.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2015-02-11 15:03:16 +10:30
Tetsuo Handa 5e05bf5833 virtio: Avoid possible kernel panic if DEBUG is enabled.
The virtqueue_add() calls START_USE() upon entry. The virtqueue_kick() is
called if vq->num_added == (1 << 16) - 1 before calling END_USE().
The virtqueue_kick_prepare() called via virtqueue_kick() calls START_USE()
upon entry, and will call panic() if DEBUG is enabled.
Move this virtqueue_kick() call to after END_USE() call.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-02-11 15:03:14 +10:30
Pawel Moll 1862ee22ce virtio-mmio: Update the device to OASIS spec version
This patch add a support for second version of the virtio-mmio device,
which follows OASIS "Virtual I/O Device (VIRTIO) Version 1.0"
specification.

Main changes:

1. The control register symbolic names use the new device/driver
   nomenclature rather than the old guest/host one.

2. The driver detect the device version (version 1 is the pre-OASIS
   spec, version 2 is compatible with fist revision of the OASIS spec)
   and drives the device accordingly.

3. New version uses direct addressing (64 bit address split into two
   low/high register) instead of the guest page size based one,
   and addresses each part of the queue (descriptors, available, used)
   separately.

4. The device activity is now explicitly triggered by writing to the
   "queue ready" register.

5. Whole 64 bit features are properly handled now (both ways).

Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-23 14:57:10 +10:30
Michael S. Tsirkin 76545f066d virtio_pci_modern: drop an unused function
release function in modern driver is unused:
it's a left-over from when each driver had
to have its own release.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-01-21 16:29:01 +10:30
Michael S. Tsirkin ac399d8f39 virtio_pci: add module param to force legacy mode
If set, try legacy interface first, modern one if that fails.  Useful to
work around device/driver bugs, and for compatibility testing.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21 16:29:01 +10:30
Michael S. Tsirkin 46506da5f3 virtio_pci: add an option to disable legacy driver
Useful for testing device virtio 1 compatibility.
Based on patch by Rusty - couldn't resist putting
that flying car joke in there!

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21 16:28:59 +10:30
Michael S. Tsirkin 0327642337 virtio_pci: drop Kconfig warnings
The ABI *is* stable, and has been for a while now.
Drop Kconfig warning saying that it's not guaranteed
to work.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21 16:28:59 +10:30
Michael S. Tsirkin b2a6d51ddf virtio_pci: Kconfig grammar fix
This drivers -> this driver.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21 16:28:58 +10:30
Michael S. Tsirkin 43b4f721ce virtio_ring: coding style fix
Most of our code has
struct foo {
}

Fix one instances where ring is inconsistent.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21 16:28:57 +10:30
Michael S. Tsirkin 25e65e4efc virtio_balloon: coding style fixes
Most of our code has
struct foo {
}

Fix two instances where balloon is inconsistent.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21 16:28:56 +10:30
Michael S. Tsirkin d3f5f06560 virtio_pci_modern: support devices with no config
Virtio 1.0 spec lists device config as optional.
Set get/set callbacks to NULL. Drivers can check that
and fail gracefully.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21 16:28:55 +10:30
Michael S. Tsirkin 3909213cfd virtio_pci_modern: reduce number of mappings
We don't know the # of VQs that drivers are going to use so it's hard to
predict how much memory we'll need to map. However, the relevant
capability does give us an upper limit.
If that's below a page, we can reduce the number of required
mappings by mapping it all once ahead of the time.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21 16:28:55 +10:30
Rusty Russell 89461c4a12 virtio_pci: macros for PCI layout offsets
QEMU wants it, so why not?  Trust, but verify.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-01-21 16:28:54 +10:30
Michael S. Tsirkin 1fcf0512c9 virtio_pci: modern driver
Lightly tested against qemu.

One thing *not* implemented here is separate mappings
for descriptor/avail/used rings. That's nice to have,
will be done later after we have core support.

This also exposes the PCI layout to userspace, and
adds macros for PCI layout offsets:

QEMU wants it, so why not?  Trust, but verify.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-01-21 16:28:53 +10:30
Michael S. Tsirkin ff31d2e285 virtio_pci: move probe/remove code to common
Most of initialization is device-independent.
Let's move it to common.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21 16:28:51 +10:30
Sasha Levin 2bd56afd44 virtio_pci: drop useless del_vqs call
Device VQs were getting freed twice: once in every device's removal
functions, and then again in virtio_pci_legacy_remove().  The ones in
devices are called first, so drop the useless second call.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21 16:28:50 +10:30
Michael S. Tsirkin 2d9becc1e0 virtio/balloon: verify device has config space
Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/balloon needs config space access so make it
fail gracefully if not there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21 16:28:48 +10:30
Michael S. Tsirkin a1eb03f546 virtio_pci: document why we defer kfree
The reason we defer kfree until release function is because it's a
general rule for kobjects: kfree of the reference counter itself is only
legal in the release function.

Previous patch didn't make this clear, document this in code.

Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-01-06 16:35:36 +02:00
Sasha Levin 63bd62a08c virtio_pci: defer kfree until release callback
A struct device which has just been unregistered can live on past the
point at which a driver decides to drop it's initial reference to the
kobject gained on allocation.

This implies that when releasing a virtio device, we can't free a struct
virtio_device until the underlying struct device has been released,
which might not happen immediately on device_unregister().

Unfortunately, this is exactly what virtio pci does:
it has an empty release callback, and frees memory immediately
after unregistering the device.

This causes an easy to reproduce crash if CONFIG_DEBUG_KOBJECT_RELEASE
it enabled.

To fix, free the memory only once we know the device is gone in the release
callback.

Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-01-06 16:35:36 +02:00