Commit Graph

439 Commits

Author SHA1 Message Date
Michael S. Tsirkin f30eaf4a09 virtio_pci: use priv for vq notification
slightly reduce the amount of pointer chasing this needs to do.
More importantly, this will easily generalize to virtio 1.0.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 21:42:02 +02:00
Michael S. Tsirkin 3ec7a77bb3 virtio_pci: free up vq->priv
We don't need to go from vq to vq info on
data path, so using direct vq->priv pointer for that
seems like a waste.

Let's build an array of vq infos, then we can use vq->index
for that lookup.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 21:42:01 +02:00
Michael S. Tsirkin f913dd4536 virtio_pci: fix coding style for structs
should be

struct foo {
}

not

struct foo
{
}

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 21:42:01 +02:00
Michael S. Tsirkin af535722f8 virtio_pci: add isr field
Use isr field instead of direct access to ioaddr.
This way generalizes easily to virtio 1.0.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 21:42:00 +02:00
Michael S. Tsirkin d71a6fc6b9 virtio: drop legacy_only driver flag
legacy_only flag is now unused, drop it from core.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09 21:42:00 +02:00
Michael S. Tsirkin 63d9f218a3 virtio_balloon: drop legacy_only driver flag
we have blacklisted balloon in core, no need
for a driver flag.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09 21:41:59 +02:00
Michael S. Tsirkin 5c609a5ef0 virtio: allow finalize_features to fail
This will make it easy for transports to validate features and return
failure.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 16:32:32 +02:00
Michael S. Tsirkin b6098c3042 virtio: add API to detect legacy devices
transports need to be able to detect legacy-only
devices (ATM balloon only) to use legacy path
to drive them.

Add a core API to do just that.
The implementation just blacklists balloon:
not too pretty, but let's not over-engineer.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09 12:06:33 +02:00
Michael S. Tsirkin 747ae34a6e virtio: make VIRTIO_F_VERSION_1 a transport bit
Activate VIRTIO_F_VERSION_1 automatically unless legacy_only
is set.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 12:06:32 +02:00
Michael S. Tsirkin df1b57fe59 virtio_balloon: add legacy_only flag
We have no plans to support virtio 1.0 in balloon driver.  Add an
explicit flag to mark it legacy only.

This will be used by follow-up patches.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 12:06:32 +02:00
Michael S. Tsirkin b3bb62d119 virtio: add legacy feature table support
virtio-blk has some legacy feature bits that modern drivers
must not negotiate, but are needed for old legacy hosts
(that e.g. don't support virtio-scsi).
Allow a separate legacy feature table for such cases.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09 12:05:26 +02:00
Michael S. Tsirkin c102659d69 virtio: simplify feature bit handling
Now that we use u64 for bits, we can simply & them together.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09 12:05:25 +02:00
Michael S. Tsirkin cb3f6d9da4 virtio: set FEATURES_OK
set FEATURES_OK as per virtio 1.0 spec

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09 12:05:25 +02:00
Cornelia Huck 8906265215 virtio: allow transports to get avail/used addresses
For virtio-1, we can theoretically have a more complex virtqueue
layout with avail and used buffers not on a contiguous memory area
with the descriptor table. For now, it's fine for a transport driver
to stay with the old layout: It needs, however, a way to access
the locations of the avail/used rings so it can register them with
the host.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 12:05:25 +02:00
Michael S. Tsirkin 00e6f3d9d9 virtio_ring: switch to new memory access APIs
Use virtioXX_to_cpu and friends for access to
all multibyte structures in memory.

Note: this is intentionally mechanical.
A follow-up patch will split long lines etc.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09 12:05:25 +02:00
Michael S. Tsirkin 93d389f820 virtio: assert 32 bit features in transports
At this point, no transports set any of the high 32 feature bits.
Since transports generally can't (yet) cope with such bits, add BUG_ON
checks to make sure they are not set by mistake.

Based on rproc patch by Rusty.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09 12:05:24 +02:00
Michael S. Tsirkin d025477368 virtio: add support for 64 bit features.
Change u32 to u64, and use BIT_ULL and 1ULL everywhere.

Note: transports are unchanged, and only set low 32 bit.
This guarantees that no transport sets e.g. VERSION_1
by mistake without proper support.

Based on patch by Rusty.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09 12:05:24 +02:00
Michael S. Tsirkin e16e12be34 virtio: use u32, not bitmap for features
It seemed like a good idea to use bitmap for features
in struct virtio_device, but it's actually a pain,
and seems to become even more painful when we get more
than 32 feature bits.  Just change it to a u32 for now.

Based on patch by Rusty.

Suggested-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09 12:05:23 +02:00
Raushaniya Maksudova 5a10b7dbf9 virtio_balloon: free some memory from balloon on OOM
Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless
it is often the case that these control tools does not have enough time
to react on fast changing memory load. As a result OS runs out of memory
and invokes OOM-killer. The balancing of memory by use of the virtio
balloon should not cause the termination of processes while there are
pages in the balloon. Now there is no way for virtio balloon driver to
free some memory at the last moment before some process will be get
killed by OOM-killer.

This does not provide a security breach as balloon itself is running
inside guest OS and is working in the cooperation with the host. Thus
some improvements from guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make
the system to return and retry the allocation that forced the out of
memory killer to run.

Allocate virtio  feature bit for this: it is not set by default,
the the guest will not deflate virtio balloon on OOM without explicit
permission from host.

Signed-off-by: Raushaniya Maksudova <rmaksudova@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-11 17:09:58 +10:30
Raushaniya Maksudova 1fd9c67203 virtio_balloon: return the amount of freed memory from leak_balloon()
This value would be useful in the next patch to provide the amount of
the freed memory for OOM killer.

Signed-off-by: Raushaniya Maksudova <rmaksudova@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-11 17:09:57 +10:30
Wolfram Sang 257d6e5a3b virtio: drop owner assignment from platform_drivers
A platform_driver does not need to set an owner, it will be populated by the
driver core.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2014-10-20 16:21:55 +02:00
Linus Torvalds 0e6e58f941 One cc: stable commit, the rest are a series of minor cleanups which have
been sitting in MST's tree during my vacation.  I changed a function name
 and made one trivial change, then they spent two days in linux-next.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUQFBQAAoJENkgDmzRrbjxJRIP/1yCQRElQewxURSmJelyqCdU
 0mHYB0R9Mf3tfre1xnofqs2lWeSMc/4ptKHsVR6pupoztSwnz7HsLHfEFvFJh4mj
 KsaqYElxkNxTcfyHwLjyJS0/J6tG1tYypXGiimTBS0bvFHL3XZdimVgJ6WvX+gO7
 YSaDEX8/EqCERafslS5+gKJlz3drDOnCZCe9y4BDSmsvl2k7bkpSxIn8vsR6jIC0
 c5JpUy6QVF+3XA/J932M7yRs+xpqxNoUWiyY3ar9o3CtQAaQB0ZAetSxY6hTfvVc
 GlNFzCifdsaQwsl2SVsE2h6tWaRhtMtcGWQuhHThIPyIf8XxhYyBRY2FLo70LMz1
 eqtwy6F/Bg/nzUsdee4PZBMeoKHlAEL12RpsEKgfUoLzj16Aqa8ll+Agbglbkw8G
 f3d2FwzKAlpY5NwHETC1wYy52PJ3efqksRWuhokmYpxNSbHJS/lsiJOE7272/4Qr
 MtXuvRmo22tf34XFd5y7zqWjgZ58eeFOqQWi/K+6ZgpqVOvikjrXXKEuiVdjO0ZD
 kTVR/sQKiR+79rzENk80XBhWaMveECNXF1TiZ/3MmURkmEOBRQMxRQ20BX3exvna
 AJ/WVA5DcfXZc1yyqknE1NLGrvSBMJENH13x2QPwrqNWAryOOKuF1VKKIwWlDw5j
 vtx5nXiJa8YYdxI2TJCN
 =JK6x
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "One cc: stable commit, the rest are a series of minor cleanups which
  have been sitting in MST's tree during my vacation.  I changed a
  function name and made one trivial change, then they spent two days in
  linux-next"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits)
  virtio-rng: refactor probe error handling
  virtio_scsi: drop scan callback
  virtio_balloon: enable VQs early on restore
  virtio_scsi: fix race on device removal
  virito_scsi: use freezable WQ for events
  virtio_net: enable VQs early on restore
  virtio_console: enable VQs early on restore
  virtio_scsi: enable VQs early on restore
  virtio_blk: enable VQs early on restore
  virtio_scsi: move kick event out from virtscsi_init
  virtio_net: fix use after free on allocation failure
  9p/trans_virtio: enable VQs early
  virtio_console: enable VQs early
  virtio_blk: enable VQs early
  virtio_net: enable VQs early
  virtio: add API to enable VQs early
  virtio_net: minor cleanup
  virtio-net: drop config_mutex
  virtio_net: drop config_enable
  virtio-blk: drop config_mutex
  ...
2014-10-18 10:25:09 -07:00
Michael S. Tsirkin 486d2e632c virtio_balloon: enable VQs early on restore
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after resume returns, virtio balloon
violated this rule by adding bufs, which causes the VQ to be used
directly within restore.

To fix, call virtio_device_ready before using VQ.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15 10:25:13 +10:30
Michael S. Tsirkin 22b7050a02 virtio: defer config changed notifications
Defer config changed notifications that arrive during
probe/scan/freeze/restore.

This will allow drivers to set DRIVER_OK earlier, without worrying about
racing with config change interrupts.

This change will also benefit old hypervisors (before 2009)
that send interrupts without checking DRIVER_OK: previously,
the callback could race with driver-specific initialization.

This will also help simplify drivers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cosmetic changes)
2014-10-15 10:24:56 +10:30
Michael S. Tsirkin c6716bae52 virtio-pci: move freeze/restore to virtio core
This is in preparation to extending config changed event handling
in core.
Wrapping these in an API also seems to make for a cleaner code.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15 10:24:55 +10:30
Michael S. Tsirkin 016c98c6fe virtio: unify config_changed handling
Replace duplicated code in all transports with a single wrapper in
virtio.c.

The only functional change is in virtio_mmio.c: if a buggy device sends
us an interrupt before driver is set, we previously returned IRQ_NONE,
now we return IRQ_HANDLED.

As this must not happen in practice, this does not look like a big deal.

See also commit 3fff0179e3
	virtio-pci: do not oops on config change if driver not loaded.
for the original motivation behind the driver check.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15 10:24:54 +10:30
Michael S. Tsirkin 6fbc198cf6 virtio_pci: fix virtio spec compliance on restore
On restore, virtio pci does the following:
+ set features
+ init vqs etc - device can be used at this point!
+ set ACKNOWLEDGE,DRIVER and DRIVER_OK status bits

This is in violation of the virtio spec, which
requires the following order:
- ACKNOWLEDGE
- DRIVER
- init vqs
- DRIVER_OK

This behaviour will break with hypervisors that assume spec compliant
behaviour.  It seems like a good idea to have this patch applied to
stable branches to reduce the support butden for the hypervisors.

Cc: stable@vger.kernel.org
Cc: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15 10:24:53 +10:30
Konstantin Khlebnikov 09316c09dd mm/balloon_compaction: add vmstat counters and kpageflags bit
Always mark pages with PageBalloon even if balloon compaction is disabled
and expose this mark in /proc/kpageflags as KPF_BALLOON.

Also this patch adds three counters into /proc/vmstat: "balloon_inflate",
"balloon_deflate" and "balloon_migrate".  They accumulate balloon
activity.  Current size of balloon is (balloon_inflate - balloon_deflate)
pages.

All generic balloon code now gathered under option CONFIG_MEMORY_BALLOON.
It should be selected by ballooning driver which wants use this feature.
Currently virtio-balloon is the only user.

Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:26:01 -04:00
Konstantin Khlebnikov 9d1ba80564 mm/balloon_compaction: remove balloon mapping and flag AS_BALLOON_MAP
Now ballooned pages are detected using PageBalloon().  Fake mapping is no
longer required.  This patch links ballooned pages to balloon device using
field page->private instead of page->mapping.  Also this patch embeds
balloon_dev_info directly into struct virtio_balloon.

Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:26:01 -04:00
Konstantin Khlebnikov d6d86c0a7f mm/balloon_compaction: redesign ballooned pages management
Sasha Levin reported KASAN splash inside isolate_migratepages_range().
Problem is in the function __is_movable_balloon_page() which tests
AS_BALLOON_MAP in page->mapping->flags.  This function has no protection
against anonymous pages.  As result it tried to check address space flags
inside struct anon_vma.

Further investigation shows more problems in current implementation:

* Special branch in __unmap_and_move() never works:
  balloon_page_movable() checks page flags and page_count.  In
  __unmap_and_move() page is locked, reference counter is elevated, thus
  balloon_page_movable() always fails.  As a result execution goes to the
  normal migration path.  virtballoon_migratepage() returns
  MIGRATEPAGE_BALLOON_SUCCESS instead of MIGRATEPAGE_SUCCESS,
  move_to_new_page() thinks this is an error code and assigns
  newpage->mapping to NULL.  Newly migrated page lose connectivity with
  balloon an all ability for further migration.

* lru_lock erroneously required in isolate_migratepages_range() for
  isolation ballooned page.  This function releases lru_lock periodically,
  this makes migration mostly impossible for some pages.

* balloon_page_dequeue have a tight race with balloon_page_isolate:
  balloon_page_isolate could be executed in parallel with dequeue between
  picking page from list and locking page_lock.  Race is rare because they
  use trylock_page() for locking.

This patch fixes all of them.

Instead of fake mapping with special flag this patch uses special state of
page->_mapcount: PAGE_BALLOON_MAPCOUNT_VALUE = -256.  Buddy allocator uses
PAGE_BUDDY_MAPCOUNT_VALUE = -128 for similar purpose.  Storing mark
directly in struct page makes everything safer and easier.

PagePrivate is used to mark pages present in page list (i.e.  not
isolated, like PageLRU for normal pages).  It replaces special rules for
reference counter and makes balloon migration similar to migration of
normal pages.  This flag is protected by page_lock together with link to
the balloon device.

Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/p/53E6CEAA.9020105@oracle.com
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: <stable@vger.kernel.org>	[3.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:26:01 -04:00
Rusty Russell b25bd2515e virtio_ring: unify direct/indirect code paths.
virtqueue_add() populates the virtqueue descriptor table from the sgs
given.  If it uses an indirect descriptor table, then it puts a single
descriptor in the descriptor table pointing to the kmalloc'ed indirect
table where the sg is populated.

Previously vring_add_indirect() did the allocation and the simple
linear layout.  We replace that with alloc_indirect() which allocates
the indirect table then chains it like the normal descriptor table so
we can reuse the core logic.

This slows down pktgen by less than 1/2 a percent (which uses direct
descriptors), as well as vring_bench, but it's far neater.

vring_bench before:
	1061485790-1104800648(1.08254e+09+/-6.6e+06)ns
vring_bench after:
	1125610268-1183528965(1.14172e+09+/-8e+06)ns

pktgen before:
   787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec (365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0

pktgen after:
   779988-790404(786391+/-2.5e+03)pps 361-366(364.35+/-1.3)Mb/sec (361914432-366747456(3.64885e+08+/-1.2e+06)bps) errors: 0

Now, if we make force indirect descriptors by turning off any_header_sg
in virtio_net.c:

pktgen before:
  713773-721062(718374+/-2.1e+03)pps 331-334(332.95+/-0.92)Mb/sec (331190672-334572768(3.33325e+08+/-9.6e+05)bps) errors: 0
pktgen after:
  710542-719195(714898+/-2.4e+03)pps 329-333(331.15+/-1.1)Mb/sec (329691488-333706480(3.31713e+08+/-1.1e+06)bps) errors: 0

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:52:35 -04:00
Rusty Russell eeebf9b1fc virtio_ring: assume sgs are always well-formed.
We used to have several callers which just used arrays.  They're
gone, so we can use sg_next() everywhere, simplifying the code.

On my laptop, this slowed down vring_bench by 15%:

vring_bench before:
	936153354-967745359(9.44739e+08+/-6.1e+06)ns
vring_bench after:
	1061485790-1104800648(1.08254e+09+/-6.6e+06)ns

However, a more realistic test using pktgen on a AMD FX(tm)-8320 saw
a few percent improvement:

pktgen before:
  767390-792966(785159+/-6.5e+03)pps 356-367(363.75+/-2.9)Mb/sec (356068960-367936224(3.64314e+08+/-3e+06)bps) errors: 0

pktgen after:
   787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec (365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:50:46 -04:00
Benoit Taine cef340e6aa virtio: Replace DEFINE_PCI_DEVICE_TABLE macro use
We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to meet
kernel coding style guidelines. This issue was reported by checkpatch.

Signed-off-by: Benoit Taine <benoit.taine@lip6.fr>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-07-27 21:07:15 +09:30
Rusty Russell e2dcdfe95c virtio: virtio_break_device() to mark all virtqueues broken.
Good for post-apocalyptic scenarios, like S/390 hotplug.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-04-28 11:34:13 +09:30
Rusty Russell 70670444c2 virtio: fail adding buffer on broken queues.
Heinz points out that adding buffers to a broken virtqueue (which
should "never happen") still works.  Failing allows drivers to detect
and complain about broken devices.

Now drivers are robust, we can add this extra check.

Reported-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13 11:27:57 +10:30
Rusty Russell 4951cc9083 virtio_balloon: don't crash if virtqueue is broken.
A bad implementation of virtio might cause us to mark the virtqueue
broken: we'll dev_err() in that case, and the device is useless, but
let's not BUG().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13 11:27:56 +10:30
Rusty Russell 1f74ef0f2d virtio_balloon: don't softlockup on huge balloon changes.
When adding or removing 100G from a balloon:

    BUG: soft lockup - CPU#0 stuck for 22s! [vballoon:367]

We have a wait_event_interruptible(), but the condition is always true
(more ballooning to do) so we don't ever sleep.  We also have a
wait_event() for the host to ack, but that is also always true as QEMU
is synchronous for balloon operations.

Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com>
Cc: stable@kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13 11:27:55 +10:30
Alexander Gordeev 5e37f67063 virtio: Use pci_enable_msix_exact() instead of pci_enable_msix()
As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range()  or pci_enable_msi_exact()
and pci_enable_msix_range() or pci_enable_msix_exact()
interfaces.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13 11:27:54 +10:30
Joel Stanley 6abb2dd928 tools/virtio: fix missing kmemleak_ignore symbol
In commit bb478d8b16 virtio_ring: plug kmemleak false positive,
kmemleak_ignore was introduced. This broke compilation of virtio_test:

  cc -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign
    -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD
    -U_FORTIFY_SOURCE   -c -o virtio_ring.o ../../drivers/virtio/virtio_ring.c
  ../../drivers/virtio/virtio_ring.c: In function ‘vring_add_indirect’:
  ../../drivers/virtio/virtio_ring.c:177:2: warning: implicit declaration
  of function ‘kmemleak_ignore’ [-Wimplicit-function-declaration]
    kmemleak_ignore(desc);
    ^
  cc   virtio_test.o virtio_ring.o   -o virtio_test
  virtio_ring.o: In function `vring_add_indirect':
  tools/virtio/../../drivers/virtio/virtio_ring.c:177:
  undefined reference to `kmemleak_ignore'

Add a dummy header for tools/virtio, and add #incldue <linux/kmemleak.h>
to drivers/virtio/virtio_ring.c so it is picked up by the userspace
tools.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13 11:23:25 +10:30
Linus Torvalds 93b05cba8e A few simple fixes. Quiet cycle.
Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJS4H+AAAoJENkgDmzRrbjxhXgP/0S4aPYPm6aw+nMKzrbZ6umd
 7kFPjZyh8zDuOOwLOIFt2gepI/H0d56ZWi4EmNBO5Md0OChNjHjDF4sC/PA+vTsQ
 DUoP6zv2yGlmbnaIri4Xo7DXOM/40G4+VOtO60KsTnwjIj8/bqA/VGBFoodqKVw4
 lBgkFs32IHynZuwxj14UeuzcPBR8tmHQDvH+ROqa5/7lzMHf24MQLh7NjNCJyoz9
 d56GW5tFNpWYgZO27v/QuAI2wHDPDpSOoZnPCh1yfr8+kk+W2HfVFYVD0dOiu2VR
 FMQ8OMGI+o9XMKvvwCxp2WkXExZRv0KbUOi8+LhphygZgLQw7FEYdulRLpPcach4
 T4vzpTHXBVhkrBNkKJIYS556xuubwnQM57ZcaoiDr16SHnyji/tALRIrhnyfvwuo
 r44HxVkq6RjMlDGZs/5JdmjcW+FWMg5/eVZBkyr+Y+/5xTo8E+WYz6JhMEc+hrrq
 TL8HDUtpEwFx1HORwkPUBNKoj14sqKwst6wVgN4iyTv534Cq8lqDPanZncBRb2ho
 p0NxTC96hqOz/LRAy5mS3qn530DOUuvEeV0KGIgbUFn4nuWeGO4kswUY7GlU8dda
 4ww3Uxw5SF9uHYBZWtXavBLIq6PhcTdJ6Ckz83kgRwev4gLo9Rj83yW/hwjP9pxu
 FwRo38ZtfY9jyL4w0cv1
 =mh3b
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio update from Rusty Russell:
 "A few simple fixes.  Quiet cycle"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  drivers: virtio: Mark function virtballoon_migratepage() as static in virtio_balloon.c
  virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze
  virtio: pci: remove unnecessary pci_set_drvdata()
2014-01-22 22:24:35 -08:00
Rashika Kheria 05c54de8c8 drivers: virtio: Mark function virtballoon_migratepage() as static in virtio_balloon.c
Mark the function virtballoon_migratepage() as static in
virtio_balloon.c because it is not used outside this file.

This eliminates the following warning in virtio_balloon.c:
drivers/virtio/virtio_balloon.c:372:5: warning: no previous prototype for ‘virtballoon_migratepage’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-01-16 10:22:28 +10:30
Luiz Capitulino 3459f11a8b virtio_balloon: update_balloon_size(): update correct field
According to the virtio spec, the device configuration field
that should be updated after an inflation or deflation
operation is the 'actual' field, not the 'num_pages' one.

Commit 855e0c5288 swapped them
in update_balloon_size(). Fix it.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: 855e0c5288
2013-12-05 13:12:39 +10:30
Jingoo Han 7d2dddda5c virtio: pci: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-12-04 14:20:26 +10:30
Linus Torvalds b746f9c794 Nothing really exciting: some groundwork for changing virtio endian, and
some robustness fixes for broken virtio devices, plus minor tweaks.
 
 [vs last pull request: added the virtio-scsi broken vq escape patch, which
 I somehow lost.]
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSgDsJAAoJENkgDmzRrbjxEE4P/jXqZHS/HdlxW9k0BjKKlEIF
 PdtCoP3UhWTdskXvy2pD8m6nYn214MEJYUIa4HFlIEZsdxhuexzQHY19Ynkjagyv
 57sRsUUm5fYQLIL7IUh2DUD1VU38hUFinno/y333szzvCj9qITDA/QABsiWxK8NO
 dq+Lmeixgrhc5yN9iryW+gZV+hekJIZ4LsU5ejSaJucKblzXUH8qIbmSthG7RTYJ
 tr4J7xTTXbhxY4CoC5Dpx2hvsFkvzaAIvI4Nr1mDjfq5cR8BaYvnC89U1IbhdAey
 p1AbZE58JLrY+Z8K8LBRGV2KjO8qSZ6R47hbZ9nAnodJYB7sZLyj6jUe1q+/htuC
 Dh9Xm9O4eW2xNaFk20dYeIF4UU5/HzdsbvG/IlH8x4sm8/K706ocYyAOHlzYUg2T
 k7gltrgDzDokMgb2R44gwnr4oaJ2q8Gne6JXswlPEv2eRs6vNnA5Xhc0rEHGkU6C
 gYn1vNFN6yx0vf2syG/Ce5pZtMxGpefKQkHzzWdq8FKr1B9s54dDuf2hls7J8A9t
 OQT1gE33yURSelf4Kh4k9zWXaWk/Ohv9l2R1cqpALnJ4/+q0fP5t7HdK500S7aax
 DxLeFeqvsBw7nlWgsGxQmt+fjITQFHhcDiwst0ehnt6RbDEW7XPIguz0K/gyhxYG
 +UNbl/5Gr64jnUX3YCzm
 =vY2L
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "Nothing really exciting: some groundwork for changing virtio endian,
  and some robustness fixes for broken virtio devices, plus minor
  tweaks"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  virtio_scsi: verify if queue is broken after virtqueue_get_buf()
  x86, asmlinkage, lguest: Pass in globals into assembler statement
  virtio: mmio: fix signature checking for BE guests
  virtio_ring: adapt to notify() returning bool
  virtio_net: verify if queue is broken after virtqueue_get_buf()
  virtio_console: verify if queue is broken after virtqueue_get_buf()
  virtio_blk: verify if queue is broken after virtqueue_get_buf()
  virtio_ring: add new function virtqueue_is_broken()
  virtio_test: verify if virtqueue_kick() succeeded
  virtio_net: verify if virtqueue_kick() succeeded
  virtio_ring: let virtqueue_{kick()/notify()} return a bool
  virtio_ring: change host notification API
  virtio_config: remove virtio_config_val
  virtio: use size-based config accessors.
  virtio_config: introduce size-based accessors.
  virtio_ring: plug kmemleak false positive.
  virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM
2013-11-15 13:28:47 +09:00
Marc Zyngier 4ae8537072 virtio: mmio: fix signature checking for BE guests
As virtio-mmio config registers are specified to be little-endian,
using readl() to read the magic value and then memcmp() to check it
fails on BE (as readl() has an implicit swab).

Fix it by encoding the magic value as an integer instead of a string.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-11-07 12:13:04 +10:30
Heinz Graalfs 2342d6a651 virtio_ring: adapt to notify() returning bool
Correct if statement to check for bool returned by notify()
(introduced in 5b1bf7cb67).

Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-11-05 21:21:08 +10:30
Heinz Graalfs b3b32c9413 virtio_ring: add new function virtqueue_is_broken()
Add new function virtqueue_is_broken(). Callers of virtqueue_get_buf()
should check for a broken queue.

Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-29 11:28:17 +10:30
Heinz Graalfs 5b1bf7cb67 virtio_ring: let virtqueue_{kick()/notify()} return a bool
virtqueue_{kick()/notify()} should exploit the new host notification API.
If the notify call returned with a negative value the host kick failed
(e.g. a kick triggered after a device was hot-unplugged). In this case
the virtqueue is set to 'broken' and false is returned, otherwise true.

Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-29 11:28:12 +10:30
Heinz Graalfs 46f9c2b925 virtio_ring: change host notification API
Currently a host kick error is silently ignored and not reflected in
the virtqueue of a particular virtio device.

Changing the notify API for guest->host notification seems to be one
prerequisite in order to be able to handle such errors in the context
where the kick is triggered.

This patch changes the notify API. The notify function must return a
bool return value. It returns false if the host notification failed.

Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-29 11:28:11 +10:30
Greg Kroah-Hartman 3736dab6e5 virtio: convert bus code to use dev_groups
The dev_attrs field of struct bus_type is going away soon, dev_groups
should be used instead.  This converts the virtio bus code to use the
correct field.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: <virtualization@lists.linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-16 18:40:57 -07:00
Rusty Russell 855e0c5288 virtio: use size-based config accessors.
This lets the transport do endian conversion if necessary, and insulates
the drivers from the difference.

Most drivers can use the simple helpers virtio_cread() and virtio_cwrite().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-17 10:55:37 +10:30
Rusty Russell bb478d8b16 virtio_ring: plug kmemleak false positive.
unreferenced object 0xffff88003d467e20 (size 32):
  comm "softirq", pid 0, jiffies 4295197765 (age 6.364s)
  hex dump (first 32 bytes):
    28 19 bf 3d 00 00 00 00 0c 00 00 00 01 00 01 00  (..=............
    02 dc 51 3c 00 00 00 00 56 00 00 00 00 00 00 00  ..Q<....V.......
  backtrace:
    [<ffffffff8152db19>] kmemleak_alloc+0x59/0xc0
    [<ffffffff81102e93>] __kmalloc+0xf3/0x180
    [<ffffffff812db5d6>] vring_add_indirect+0x36/0x280
    [<ffffffff812dc59f>] virtqueue_add_outbuf+0xbf/0x4e0
    [<ffffffff813a8b30>] start_xmit+0x1a0/0x3b0
    [<ffffffff81445861>] dev_hard_start_xmit+0x2d1/0x4d0
    [<ffffffff81460052>] sch_direct_xmit+0xf2/0x1c0
    [<ffffffff81445c28>] dev_queue_xmit+0x1c8/0x460
    [<ffffffff814e3187>] ip6_finish_output2+0x1d7/0x470
    [<ffffffff814e34b0>] ip6_finish_output+0x90/0xb0
    [<ffffffff814e3507>] ip6_output+0x37/0xb0
    [<ffffffff815021eb>] igmp6_send+0x2db/0x470
    [<ffffffff81502645>] igmp6_timer_handler+0x95/0xa0
    [<ffffffff8104b57c>] call_timer_fn+0x2c/0x90
    [<ffffffff8104b7ba>] run_timer_softirq+0x1da/0x1f0
    [<ffffffff81045721>] __do_softirq+0xd1/0x1b0

Address gets embedded in a descriptor via virt_to_phys().  See detach_buf,
which frees it:

	if (vq->vring.desc[i].flags & VRING_DESC_F_INDIRECT)
		kfree(phys_to_virt(vq->vring.desc[i].addr));

Reported-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Fix-suggested-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Typing-done-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-17 10:55:35 +10:30
Aaron Lu 8910700039 virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM
The freeze and restore functions defined in virtio drivers are used
for suspend and hibernate, so CONFIG_PM_SLEEP is more appropriate than
CONFIG_PM. This patch replace all CONFIG_PM with CONFIG_PM_SLEEP for
virtio drivers that implement freeze and restore callbacks.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-09-23 15:45:58 +09:30
Aaron Lu 9e266ece21 virtio_pci: pm: Use CONFIG_PM_SLEEP instead of CONFIG_PM
The virtio_pci_freeze/restore are defined under CONFIG_PM but is used
by SET_SYSTEM_SLEEP_PM_OPS macro, which is defined under
CONFIG_PM_SLEEP. So if CONFIG_PM_SLEEP is not cofigured but
CONFIG_PM_RUNTIME is, the following warning message appeared:

drivers/virtio/virtio_pci.c:770:12: warning: ‘virtio_pci_freeze’ defined but not used [-Wunused-function]
 static int virtio_pci_freeze(struct device *dev)
            ^
drivers/virtio/virtio_pci.c:790:12: warning: ‘virtio_pci_restore’ defined but not used [-Wunused-function]
 static int virtio_pci_restore(struct device *dev)
            ^
Fix it by changing CONFIG_PM to CONFIG_PM_SLEEP.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-09-09 10:02:53 +09:30
Linus Torvalds 5f12972171 No real surprises.
Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJR3NcfAAoJENkgDmzRrbjxtgAQAMAChsD867ADZPVaT1jcS6Me
 5OoavOnpw84AopY4cLOFxL13lDgHeDZxTbsS56WP/100WEDTFQqr9uWpuFxt/NU7
 ukXwMrN/5RBmTXtDpdApM+jnBdgRLxu6gJaEjmVZSW10XwPDar+zGJV+NUVX42uy
 8rhjZcFTPIhLz96VBeXgtnmlc8f33AwdFb1JtA6c5slMXmJ0pGKpme3Gri0Hqu0X
 sowVCchjuJXuhg3siyyjcyrDtWnH6hPNCbvAFuL4wwcGyMjb3TS7l922GiscD9pq
 KZcpkYywCosJEnaR4XZQ3ewDGnqaslgb556Tcigm2TXC9LfAR/k4Q1ZYiPGrxcmU
 zgiN/Nvr1PbMDcKr6Lr+utQGh82jeKTcrKz/CPlCJtusg+4hPCwuugDMhNqKK4/2
 3O+0c5t32K38TUdDhVRu2wXlvlyLoQc55yIbhw70O+G1Td7KMHrjXuxwyWbP+tGU
 X/D2DKQu5bcNcBv5sA04PdyRM2buYFvwSZUjxwdwWssmdUqU1xdybDK3pgWf92fF
 llsri86xZ/hOOE+A+jn/oUpXol2PAVOCoh80P7O9VeuDgPL2Fjl4UWf8HWqVvn/u
 A3kpCrIygogc4I7xQkdxBkR6Aa2Uh323rpXKm7E4Tlg0Ii6srqBWq74fz2Ou3TlS
 tOPNG/Npv6yiWo2ejXp8
 =XNkj
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "No real surprises"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  MAINTAINERS: add tools/virtio/ under virtio
  tools/virtio: move module license stub to module.h
  virtio: include asm/barrier explicitly
  virtio: VIRTIO_F_ANY_LAYOUT feature
  lguest: fix example launcher compilation for broken glibc headers.
  virtio-net: fix the race between channels setting and refill
  tools/lguest: real barriers.
  tools/lguest: fix missing rmb().
  virtio_balloon: leak_balloon(): only tell host if we got pages deflated
  virtio-pci: fix leaks of msix_affinity_masks
  Fix comment typo "CONFIG_PAE"
2013-07-10 14:50:58 -07:00
Linus Torvalds 496322bc91 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "This is a re-do of the net-next pull request for the current merge
  window.  The only difference from the one I made the other day is that
  this has Eliezer's interface renames and the timeout handling changes
  made based upon your feedback, as well as a few bug fixes that have
  trickeled in.

  Highlights:

   1) Low latency device polling, eliminating the cost of interrupt
      handling and context switches.  Allows direct polling of a network
      device from socket operations, such as recvmsg() and poll().

      Currently ixgbe, mlx4, and bnx2x support this feature.

      Full high level description, performance numbers, and design in
      commit 0a4db187a9 ("Merge branch 'll_poll'")

      From Eliezer Tamir.

   2) With the routing cache removed, ip_check_mc_rcu() gets exercised
      more than ever before in the case where we have lots of multicast
      addresses.  Use a hash table instead of a simple linked list, from
      Eric Dumazet.

   3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
      Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
      Marek Puzyniak, Michal Kazior, and Sujith Manoharan.

   4) Support reporting the TUN device persist flag to userspace, from
      Pavel Emelyanov.

   5) Allow controlling network device VF link state using netlink, from
      Rony Efraim.

   6) Support GRE tunneling in openvswitch, from Pravin B Shelar.

   7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
      Daniel Borkmann and Eric Dumazet.

   8) Allow controlling of TCP quickack behavior on a per-route basis,
      from Cong Wang.

   9) Several bug fixes and improvements to vxlan from Stephen
      Hemminger, Pravin B Shelar, and Mike Rapoport.  In particular,
      support receiving on multiple UDP ports.

  10) Major cleanups, particular in the area of debugging and cookie
      lifetime handline, to the SCTP protocol code.  From Daniel
      Borkmann.

  11) Allow packets to cross network namespaces when traversing tunnel
      devices.  From Nicolas Dichtel.

  12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
      manner akin to how we monitor real network traffic via ptype_all.
      From Daniel Borkmann.

  13) Several bug fixes and improvements for the new alx device driver,
      from Johannes Berg.

  14) Fix scalability issues in the netem packet scheduler's time queue,
      by using an rbtree.  From Eric Dumazet.

  15) Several bug fixes in TCP loss recovery handling, from Yuchung
      Cheng.

  16) Add support for GSO segmentation of MPLS packets, from Simon
      Horman.

  17) Make network notifiers have a real data type for the opaque
      pointer that's passed into them.  Use this to properly handle
      network device flag changes in arp_netdev_event().  From Jiri
      Pirko and Timo Teräs.

  18) Convert several drivers over to module_pci_driver(), from Peter
      Huewe.

  19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
      O(1) calculation instead.  From Eric Dumazet.

  20) Support setting of explicit tunnel peer addresses in ipv6, just
      like ipv4.  From Nicolas Dichtel.

  21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.

  22) Prevent a single high rate flow from overruning an individual cpu
      during RX packet processing via selective flow shedding.  From
      Willem de Bruijn.

  23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
      Dumazet.

  24) Don't just drop GSO packets which are above the TBF scheduler's
      burst limit, chop them up so they are in-bounds instead.  Also
      from Eric Dumazet.

  25) VLAN offloads are missed when configured on top of a bridge, fix
      from Vlad Yasevich.

  26) Support IPV6 in ping sockets.  From Lorenzo Colitti.

  27) Receive flow steering targets should be updated at poll() time
      too, from David Majnemer.

  28) Fix several corner case regressions in PMTU/redirect handling due
      to the routing cache removal, from Timo Teräs.

  29) We have to be mindful of ipv4 mapped ipv6 sockets in
      upd_v6_push_pending_frames().  From Hannes Frederic Sowa.

  30) Fix L2TP sequence number handling bugs, from James Chapman."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
  drivers/net: caif: fix wrong rtnl_is_locked() usage
  drivers/net: enic: release rtnl_lock on error-path
  vhost-net: fix use-after-free in vhost_net_flush
  net: mv643xx_eth: do not use port number as platform device id
  net: sctp: confirm route during forward progress
  virtio_net: fix race in RX VQ processing
  virtio: support unlocked queue poll
  net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
  Documentation: Fix references to defunct linux-net@vger.kernel.org
  net/fs: change busy poll time accounting
  net: rename low latency sockets functions to busy poll
  bridge: fix some kernel warning in multicast timer
  sfc: Fix memory leak when discarding scattered packets
  sit: fix tunnel update via netlink
  dt:net:stmmac: Add dt specific phy reset callback support.
  dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
  dt:net:stmmac: Allocate platform data only if its NULL.
  net:stmmac: fix memleak in the open method
  ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
  net: ipv6: fix wrong ping_v6_sendmsg return value
  ...
2013-07-09 18:24:39 -07:00
Michael S. Tsirkin cc229884d3 virtio: support unlocked queue poll
This adds a way to check ring empty state after enable_cb outside any
locks. Will be used by virtio_net.

Note: there's room for more optimization: caller is likely to have a
memory barrier already, which means we might be able to get rid of a
barrier here.  Deferring this optimization until we do some
benchmarking.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-09 12:45:37 -07:00
Linus Torvalds 7f0ef0267e Merge branch 'akpm' (updates from Andrew Morton)
Merge first patch-bomb from Andrew Morton:
 - various misc bits
 - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
   distracted.  There has been quite a bit of activity.
 - About half the MM queue
 - Some backlight bits
 - Various lib/ updates
 - checkpatch updates
 - zillions more little rtc patches
 - ptrace
 - signals
 - exec
 - procfs
 - rapidio
 - nbd
 - aoe
 - pps
 - memstick
 - tools/testing/selftests updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits)
  tools/testing/selftests: don't assume the x bit is set on scripts
  selftests: add .gitignore for kcmp
  selftests: fix clean target in kcmp Makefile
  selftests: add .gitignore for vm
  selftests: add hugetlbfstest
  self-test: fix make clean
  selftests: exit 1 on failure
  kernel/resource.c: remove the unneeded assignment in function __find_resource
  aio: fix wrong comment in aio_complete()
  drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
  drivers/memstick/host/r592.c: convert to module_pci_driver
  drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
  pps-gpio: add device-tree binding and support
  drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
  drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
  drivers/parport/share.c: use kzalloc
  Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
  aoe: update internal version number to v83
  aoe: update copyright date
  aoe: perform I/O completions in parallel
  ...
2013-07-03 17:12:13 -07:00
Jiang Liu 3dcc0571cd mm: correctly update zone->managed_pages
Enhance adjust_managed_page_count() to adjust totalhigh_pages for
highmem pages.  And change code which directly adjusts totalram_pages to
use adjust_managed_page_count() because it adjusts totalram_pages,
totalhigh_pages and zone->managed_pages altogether in a safe way.

Remove inc_totalhigh_pages() and dec_totalhigh_pages() from xen/balloon
driver bacause adjust_managed_page_count() has already adjusted
totalhigh_pages.

This patch also fixes two bugs:

1) enhances virtio_balloon driver to adjust totalhigh_pages when
   reserve/unreserve pages.
2) enhance memory_hotplug.c to adjust totalhigh_pages when hot-removing
   memory.

We still need to deal with modifications of totalram_pages in file
arch/powerpc/platforms/pseries/cmm.c, but need help from PPC experts.

[akpm@linux-foundation.org: remove ifdef, per Wanpeng Li, virtio_balloon.c cleanup, per Sergei]
[akpm@linux-foundation.org: export adjust_managed_page_count() to modules, for drivers/virtio/virtio_balloon.c]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <sworddragon2@aol.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:33 -07:00
Luiz Capitulino 8c6bab4f38 virtio_balloon: leak_balloon(): only tell host if we got pages deflated
balloon_page_dequeue() can return NULL.  If it does for the first page
being freed then leak_balloon() will create a scatter list with len=0.
Which in turn seems to generate an invalid virtio request.

I didn't get this in practice, I found it by code review.  On the other
hand, such an invalid virtio request will cause errors in QEMU and
fill_balloon() also performs the same check implemented by this commit.

This bug was introduced in e2250429.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org # 3.9
2013-07-02 15:42:03 +09:30
Andrew Vagin f11335db5e virtio-pci: fix leaks of msix_affinity_masks
vp_dev->msix_vectors should be initialized before allocating
msix_affinity_masks, otherwise vp_free_vectors will not free these
objects.

unreferenced object 0xffff88010f969d88 (size 512):
  comm "systemd-udevd", pid 158, jiffies 4294673645 (age 80.545s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff816e455e>] kmemleak_alloc+0x5e/0xc0
    [<ffffffff811aa7f1>] kmem_cache_alloc_node_trace+0x141/0x2c0
    [<ffffffff8133ba23>] alloc_cpumask_var_node+0x23/0x80
    [<ffffffff8133ba8e>] alloc_cpumask_var+0xe/0x10
    [<ffffffff813fdb3d>] vp_try_to_find_vqs+0x25d/0x810
    [<ffffffff813fe171>] vp_find_vqs+0x81/0xb0
    [<ffffffffa00d2a05>] init_vqs+0x85/0x120 [virtio_balloon]
    [<ffffffffa00d2c29>] virtballoon_probe+0xf9/0x1a0 [virtio_balloon]
    [<ffffffff813fb61e>] virtio_dev_probe+0xde/0x140
    [<ffffffff814452b8>] driver_probe_device+0x98/0x3a0
    [<ffffffff8144566b>] __driver_attach+0xab/0xb0
    [<ffffffff814432f4>] bus_for_each_dev+0x94/0xb0
    [<ffffffff81444f4e>] driver_attach+0x1e/0x20
    [<ffffffff81444910>] bus_add_driver+0x200/0x280
    [<ffffffff81445c14>] driver_register+0x74/0x160
    [<ffffffff813fb7d0>] register_virtio_driver+0x20/0x40

v2: change msix_vectors uncoditionaly in vp_free_vectors

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-07-02 15:42:02 +09:30
Rusty Russell b3087e48ce virtio: remove virtqueue_add_buf().
All users changed to virtqueue_add_sg() or virtqueue_add_outbuf/inbuf.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-05-20 12:16:01 +09:30
Rusty Russell 92549abc6a virtio_balloon: use simplified virtqueue accessors.
We never add buffers with input and output parts, so use the new accessors.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-03-20 15:45:06 +10:30
Rusty Russell 282edb3649 virtio_ring: virtqueue_add_outbuf / virtqueue_add_inbuf.
These are specialized versions of virtqueue_add_buf(), which cover
over 80% of cases and are far clearer.

In particular, the scatterlists passed to these functions don't have
to be clean (ie. we ignore end markers).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-03-20 15:44:52 +10:30
Rusty Russell 13816c768d virtio_ring: virtqueue_add_sgs, to add multiple sgs.
virtio_scsi can really use this, to avoid the current hack of copying
the whole sg array.  Some other things get slightly neater, too.

This causes a slowdown in virtqueue_add_buf(), which is implemented as
a wrapper.  This is addressed in the next patches.

for i in `seq 50`; do /usr/bin/time -f 'Wall time:%e' ./vringh_test --indirect --eventidx --parallel --fast-vringh; done 2>&1 | stats --trim-outliers:

Before:
	Using CPUS 0 and 3
	Guest: notified 0, pinged 39009-39063(39062)
	Host: notified 39009-39063(39062), pinged 0
	Wall time:1.700000-1.950000(1.723542)

After:
	Using CPUS 0 and 3
	Guest: notified 0, pinged 39062-39063(39063)
	Host: notified 39062-39063(39063), pinged 0
	Wall time:1.760000-2.220000(1.789167)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Asias He <asias@redhat.com>
2013-03-20 15:43:29 +10:30
Rusty Russell a9a0fef779 virtio_ring: expose virtio barriers for use in vringh.
The host side of ring needs this logic too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-03-20 14:00:41 +10:30
Linus Torvalds 3c834b6f41 All trivial, thanks to the stuff which didn't quite make it time.
Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRLEkHAAoJENkgDmzRrbjx6K0P/3o9/iW5hkOPYpu+KV2nr0wG
 6RG0uu0DCOb/tckigYwnn5PkS7UEcJu6kDypnEgXfhcNhYiBjoGIAUEQ2jBYyzQm
 IYc4oZhxDdqigJL/FHi2zL50mkWacTdBK83udxim3eRkgW9ysBRoBFwqMruVyhZv
 474KGS4PNT8pHDOCAlrRS1I9oW2pYTuUidN+SnfVJ8gFSkH0tuHEGoJeGrtDa010
 XkiWqiTJq6pDHTM5f4Wwp/WNQ21UNDBlvRahg0nqTZXicQsvujV0Tdb+EnAfXwa+
 bssqvO4X2WqlraVK1TJteufhcdhUgt/I1+45p9eLZvOXizk7EBKxFWynE7C878GV
 dhpz8i4mc+u6aJlGoHzwvKah0zhwDtqWdbDS+LNQjRmjMK9v45ttoisU+iR1GjLt
 JpDVg73K315aWJ9RLsYu7mvZ8JY+qRFkwXqX3lZd+GdohY0DSmGMxMqJG93sLBYi
 42vyHMaBd1JY0rDVqpfzlmjnSxX+k05uB0GYB3fO0CXbPxmfXtRKz7/5wpSz0Up+
 nTCs6Xa7t5jtG9qpC4cKxyEDEwB9M6SSxi+lhrIBkeDqlFwXAv1Fean4jqqhtFwr
 HO6i1mYgy0xCVY7EmequVyWlH6/B5GVB2xTMQuAz3f8pKm+XUr6C/twROYkgoXbV
 AOVyRWtnbISDVjtVFn9c
 =KcDx
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "All trivial, thanks to the stuff which didn't quite make it time"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  virtio_console: Initialize guest_connected=true for rproc_serial
  virtio: use module_virtio_driver.
  virtio: Add module driver macro for virtio drivers.
  virtio_console: Use virtio device index to generate port name
  virtio: make pci_device_id const
  virtio: make config_ops const
  virtio-mmio: fix wrong comment about register offset
  virtio_console: Let unconnected rproc device receive data.
2013-02-26 14:49:12 -08:00
Rusty Russell b2a17029c2 virtio: use module_virtio_driver.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-02-13 17:00:37 +10:30
Stephen Hemminger 35cdc9eb65 virtio: make pci_device_id const
Use DEVICE_TABLE macro which also makes table const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-02-11 15:32:18 +10:30
Stephen Hemminger 9350393239 virtio: make config_ops const
It is just a table of function pointers, make it const for cleanliness and security
reasons.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-02-11 15:32:17 +10:30
Ryota Ozaki 0d34cc2d6d virtio-mmio: fix wrong comment about register offset
The register offset of InterruptACK is 0x064, not 0x060.
Fix it.

Signed-off-by: Ryota Ozaki <ozaki.ryota@gmail.com>
Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Acked-by: Pawel Moll <pawel.moll@arm.com>
CC: Jiri Kosina <trivial@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-02-04 10:37:01 +10:30
Kees Cook d72c5a8c8c drivers/virtio: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Rusty Russell <rusty@rustcorp.com.au>
CC: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
2013-01-11 11:39:04 -08:00
Greg Kroah-Hartman 8590dbc79a Drivers: virtio: remove __dev* attributes.
CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
markings need to be removed.

This change removes the use of __devinit, __devexit_p, and __devexit
from these drivers.

Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-03 15:57:01 -08:00
Linus Torvalds b7dfde956d Some nice cleanups, and even a patch my wife did as a "live" demo for
Latinoware 2012.
 
 There's a slightly non-trivial merge in virtio-net, as we cleaned up the
 virtio add_buf interface while DaveM accepted the mq virtio-net patches.
 
 You can see my solution in my pending-rebases branch, if that helps, but I
 know you love merging:
 
 https://git.kernel.org/?p=linux/kernel/git/rusty/linux.git;a=commit;h=12e4e64fa66a4c812e4855de32abdb4d819526fe
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQz/vKAAoJENkgDmzRrbjx+eYQAK/egj9T8Nnth6mkzdbCFSO7
 Bciga2hDiudGCiGojTRGPRSc0VP9LgfvPbY2pxX+R9CfEqR+a8q/rRQhCS79ZwPB
 /mJy3HNiCx418HZxgwNtk6vPe0PjJm6SsjbXeB9hB+PQLCbdwA0BjpG6xjF/jitP
 noPqhhXreeQgYVxAKoFPvff/Byu2GlNnDdVMQxWRmo8hTKlTCzl0T/7BHRxthhJj
 iOrXTFzrT/osPT0zyqlngT03T4wlBvL2Bfw8d/kuRPEZ71dpIctWeH2KzdwXVCrz
 hFQGxAz4OWvW3xrNwj7c6O3SWj4VemUMjQqeA/PtRiOEI5gM0Y/Bit47dWL4wM/O
 OWUKFHzq4DFs8MmwXBgDDXl5xOjOBH9Ik4FZayn3Y7COT/B8CjFdOC2MdDGmZ9yd
 NInumg7FqP+u12g+9Vq8S/b0cfoQm4qFe8VHiPJu+jRmCZglyvLjk7oq/QwW8Gaq
 Pkzit1Ey0DWo2KvZ4D/nuXJCuhmzN/AJ10M48lLYZhtOIVg9gsa0xjhfgq4FnvSK
 xFCf3rcWnlGIXcOYh/hKU25WaCLzBuqMuSK35A72IujrQOL7OJTk4Oqote3Z3H9B
 08XJmyW6SOZdfw17X4Im1jbyuLek///xQJ9Jw/tya7j9lBt8zjJ+FmLPs4mLGEOm
 WJv9uZPs+QbIMNky2Lcb
 =myDR
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio update from Rusty Russell:
 "Some nice cleanups, and even a patch my wife did as a "live" demo for
  Latinoware 2012.

  There's a slightly non-trivial merge in virtio-net, as we cleaned up
  the virtio add_buf interface while DaveM accepted the mq virtio-net
  patches."

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (27 commits)
  virtio_console: Add support for remoteproc serial
  virtio_console: Merge struct buffer_token into struct port_buffer
  virtio: add drv_to_virtio to make code clearly
  virtio: use dev_to_virtio wrapper in virtio
  virtio-mmio: Fix irq parsing in command line parameter
  virtio_console: Free buffers from out-queue upon close
  virtio: Convert dev_printk(KERN_<LEVEL> to dev_<level>(
  virtio_console: Use kmalloc instead of kzalloc
  virtio_console: Free buffer if splice fails
  virtio: tools: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: scsi: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: rpmsg: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: net: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: console: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: make virtqueue_add_buf() returning 0 on success, not capacity.
  virtio: console: don't rely on virtqueue_add_buf() returning capacity.
  virtio_net: don't rely on virtqueue_add_buf() returning capacity.
  virtio-net: remove unused skb_vnet_hdr->num_sg field
  virtio-net: correct capacity math on ring full
  virtio: move queue_index and num_free fields into core struct virtqueue.
  ...
2012-12-20 08:37:05 -08:00
Wanlong Gao 9a2bdcc85d virtio: add drv_to_virtio to make code clearly
Add drv_to_virtio wrapper to get virtio_driver from device_driver.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-18 15:20:43 +10:30
Wanlong Gao 9bffdca8c6 virtio: use dev_to_virtio wrapper in virtio
Use dev_to_virtio wrapper in virtio to make code clearly.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-18 15:20:42 +10:30
Pawel Moll 40f9938c4c virtio-mmio: Fix irq parsing in command line parameter
When the resource_size_t is 64-bit long, the sscanf() on
the virtio device command line paramter string may return
wrong value because its format was defined as "%u". Fixed
by using an intermediate local value of a known length.

Also added cleaned up the resource creation and added extra
comments to make the parameters parsing easier to follow.

Reported-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-18 15:20:41 +10:30
Joe Perches 800ba5eabf virtio: Convert dev_printk(KERN_<LEVEL> to dev_<level>(
dev_<level> calls take less code than dev_printk(KERN_<LEVEL>
and reducing object size is good.
Convert if (printk_ratelimit()) dev_printk to dev_<level>_ratelimited.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-18 15:20:40 +10:30
Rusty Russell 98e8c6bc66 virtio: make virtqueue_add_buf() returning 0 on success, not capacity.
Now noone relies on this behavior, we simplify virtqueue_add_buf() so it
return 0 or -errno.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-18 15:20:34 +10:30
Rusty Russell 06ca287dba virtio: move queue_index and num_free fields into core struct virtqueue.
They're generic concepts, so hoist them.  This also avoids accessor
functions (though kept around for merge with DaveM's net tree).

This goes even further than Jason Wang's 17bb6d4088 patch
("virtio-ring: move queue_index to vring_virtqueue") which moved the
queue_index from the specific transport.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-18 15:20:31 +10:30
Wei Yongjun 1ce6853aa0 virtio-pci: use module_pci_driver to simplify the code
Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-18 15:20:30 +10:30
Rafael Aquini e22504296d virtio_balloon: introduce migration primitives to balloon pages
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

Besides making balloon pages movable at allocation time and introducing
the necessary primitives to perform balloon page migration/compaction,
this patch also introduces the following locking scheme, in order to
enhance the syncronization methods for accessing elements of struct
virtio_balloon, thus providing protection against concurrent access
introduced by parallel memory migration threads.

 - balloon_lock (mutex) : synchronizes the access demand to elements of
                          struct virtio_balloon and its queue operations;

[yongjun_wei@trendmicro.com.cn: fix missing unlock on error in fill_balloon()]
[akpm@linux-foundation.org: avoid having multiple return points in fill_balloon()]
[akpm@linux-foundation.org: fix printk warning]Signed-off-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:27 -08:00
Cornelia Huck 237242bddc virtio: Don't access index after unregister.
Virtio wants to release used indices after the corresponding
virtio device has been unregistered. However, virtio does not
hold an extra reference, giving up its last reference with
device_unregister(), making accessing dev->index afterwards
invalid.

I actually saw problems when testing my (not-yet-merged)
virtio-ccw code:

- device_add virtio-net,id=xxx
-> creates device virtio<n> with n>0

- device_del xxx
-> deletes virtio<n>, but calls ida_simple_remove with an
   index of 0

- device_add virtio-net,id=xxx
-> tries to add virtio0, which is still in use...

So let's save the index we want to release before calling
device_unregister().

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Cc: stable@kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-11-09 14:54:24 +10:30
Will Deacon b92b1b89a3 virtio: force vring descriptors to be allocated from lowmem
Virtio devices may attempt to add descriptors to a virtqueue from atomic
context using GFP_ATOMIC allocation. This is problematic because such
allocations can fall outside of the lowmem mapping, causing virt_to_phys
to report bogus physical addresses which are subsequently passed to
userspace via the buffers for the virtual device.

This patch masks out __GFP_HIGH and __GFP_HIGHMEM from the requested
flags when allocating descriptors for a virtqueue. If an atomic
allocation is requested and later fails, we will return -ENOSPC which
will be handled by the driver.

Cc: stable@kernel.org
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-22 18:19:49 +10:30
Brian Foley d78b519f6b virtio_mmio: Don't attempt to create empty virtqueues
If a virtio device reports a QueueNumMax of 0, vring_new_virtqueue()
doesn't check this, and thanks to an unsigned (i < num - 1) loop
guard, scribbles over memory when initialising the free list.

Avoid by not trying to create zero-descriptor queues, as there's no
way to do any I/O with one.

Signed-off-by: Brian Foley <brian.foley@arm.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28 15:05:16 +09:30
Brian Foley 3850d29fc4 virtio_mmio: fix off by one error allocating queue
vm_setup_vq fails to allow VirtQueues needing only 2 pages of
storage, as it should. Found with a kernel using 64kB pages, but
can be provoked if a virtio device reports QueueNumMax where the
descriptor table and available ring fit in one page, and the used
ring on the second (<= 227 descriptors with 4kB pages and <= 3640
with 64kB pages.)

Signed-off-by: Brian Foley <brian.foley@arm.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28 15:05:16 +09:30
Peter Senna Tschudin 74a74b376c drivers/virtio/virtio_pci.c: fix error return code
Convert a nonnegative error return code to a negative one, as returned
elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28 15:05:16 +09:30
Michael S. Tsirkin 5543a6ac31 virtio: don't crash when device is buggy
Because of a sanity check in virtio_dev_remove, a buggy device can crash
kernel.  And in case of rproc it's userspace so it's not a good idea.
We are unloading a driver so how bad can it be?
Be less aggressive in handling this error: if it's a driver bug,
warning once should be enough.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28 15:05:16 +09:30
Rusty Russell eccbb05a64 virtio: remove CONFIG_VIRTIO_RING
Everyone who selects VIRTIO is also made to select VIRTIO_RING; just make
them synonymous, since we removed the indirection layer some time ago.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28 15:05:15 +09:30
Rusty Russell 387daf1716 virtio: add help to CONFIG_VIRTIO option.
Trying to enable a virtio driver (eg CONFIG_VIRTIO_BLK) is painful
because it depends on CONFIG_VIRTIO.  CONFIG_VIRTIO doesn't tell you
how to turn it on (it's selected from anything which provides a virtio
bus).

This patch at least adds some documentation, visible in menuconfig, as
a hint.

Reported-by: Kent Overstreet <koverstreet@google.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28 15:05:15 +09:30
Michael S. Tsirkin 6457f126c8 virtio: support reserved vqs
virtio network device multiqueue support reserves
vq 3 for future use (useful both for future extensions and to make it
pretty - this way receive vqs have even and transmit - odd numbers).
Make it possible to skip initialization for
specific vq numbers by specifying NULL for name.
Document this usage as well as (existing) NULL callback.

Drivers using this not coded up yet, so I simply tested
with virtio-pci and verified that this patch does
not break existing drivers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28 15:05:15 +09:30
Jason Wang 75a0a52be3 virtio: introduce an API to set affinity for a virtqueue
Sometimes, virtio device need to configure irq affinity hint to maximize the
performance. Instead of just exposing the irq of a virtqueue, this patch
introduce an API to set the affinity for a virtqueue.

The api is best-effort, the affinity hint may not be set as expected due to
platform support, irq sharing or irq type. Currently, only pci method were
implemented and we set the affinity according to:

- if device uses INTX, we just ignore the request
- if device has per vq vector, we force the affinity hint
- if the virtqueues share MSI, make the affinity OR over all affinities
  requested

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28 15:05:15 +09:30
Jason Wang 17bb6d4088 virtio-ring: move queue_index to vring_virtqueue
Instead of storing the queue index in transport-specific virtio structs,
this patch moves them to vring_virtqueue and introduces an helper to get
the value.  This lets drivers simplify their management and tracing of
virtqueues.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28 15:05:15 +09:30
Rusty Russell 7a23eb28fa virtio_balloon: not EXPERIMENTAL any more.
It is not experimental in any vaguely-sane sense.

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28 15:05:14 +09:30
Michael S. Tsirkin 04679f34c1 virtio-balloon: dependency fix
Devices should depend on virtio, not select it.  It's supposed to be
selected by the particular driver, e.g. VIRTIO_PCI.
Make balloon depend on VIRTIO and EXPERIMENTAL
(to match description).

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28 15:05:14 +09:30
Nicholas Bellinger 59057fbc37 [SCSI] virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning
This patch changes virtio-scsi to use a new virtio_driver->scan() callback
so that scsi_scan_host() can be properly invoked once virtio_dev_probe() has
set add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK) to signal active virtio-ring
operation, instead of from within virtscsi_probe().

This fixes a bug where SCSI LUN scanning for both virtio-scsi-raw and
virtio-scsi/tcm_vhost setups was happening before VIRTIO_CONFIG_S_DRIVER_OK
had been set, causing VIRTIO_SCSI_S_BAD_TARGET to occur.  This fixes a bug
with virtio-scsi/tcm_vhost where LUN scan was not detecting LUNs.

Tested with virtio-scsi-raw + virtio-scsi/tcm_vhost w/ IBLOCK on 3.5-rc2 code.

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:59:03 +01:00
Michael S. Tsirkin 9c378abc5c virtio-balloon: fix add/get API use
Since ee7cd8981e 'virtio: expose added
descriptors immediately.', in virtio balloon virtqueue_get_buf might
now run concurrently with virtqueue_kick.  I audited both and this
seems safe in practice but this is not guaranteed by the API.
Additionally, a spurious interrupt might in theory make
virtqueue_get_buf run in parallel with virtqueue_add_buf, which is
racy.

While we might try to protect against spurious callbacks it's
easier to fix the driver: balloon seems to be the only one
(mis)using the API like this, so let's just fix balloon.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (removed unused var)
2012-07-09 09:07:22 +09:30
Pawel Moll 81a054ce0b virtio-mmio: Devices parameter parsing
This patch adds an option to instantiate guest virtio-mmio devices
basing on a kernel command line (or module) parameter, for example:

	virtio_mmio.devices=0x100@0x100b0000:48

Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-05-22 12:16:15 +09:30
Asias He 90e03207f4 virtio: Use ida to allocate virtio index
Current index allocation in virtio is based on a monotonically
increasing variable "index". This means we'll run out of numbers
after a while. E.g. someone crazy doing this in host side.

while(1) {
	hot-plug a virtio device
	hot-unplug the virito devcie
}

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-05-22 12:16:12 +09:30
Amit Shah c877bab507 virtio: balloon: separate out common code between remove and freeze functions
The remove and freeze functions have a lot of shared code; put it into a
common function that gets called by both.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-05-22 12:16:11 +09:30
Amit Shah c45b4166d9 virtio: balloon: drop restore_common()
restore_common() was used when there were different thaw and freeze PM
callbacks implemented.  We removed thaw in commit
f38f8387cb.

restore_common() can be removed and virtballoon_restore() can itself do
the restore ops.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-05-22 12:16:11 +09:30
Amit Shah b8ae0eb320 virtio: balloon: let host know of updated balloon size before module removal
When the balloon module is removed, we deflate the balloon, reclaiming
all the pages that were given to the host.  However, we don't update the
config values for the new balloon size, resulting in the host showing
outdated balloon values.

The size update is done after each leak and fill operation, only the
module removal case was left out.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-05-17 12:14:34 +03:00
Michael S. Tsirkin 3ccc9372ed virtio_balloon: fix handling of PAGE_SIZE != 4k
As reported by David Gibson, current code handles PAGE_SIZE != 4k
completely wrong which can lead to guest memory corruption errors:

- page_to_balloon_pfn is wrong: e.g. on system with 64K page size
 it gives the same pfn value for 16 different pages.

- we also need to convert back to linux pfns when we free.

- for each linux page we need to tell host about multiple balloon
  pages, but code only adds one pfn to the array.

This patch fixes all that, tested with a 64k ppc64 kernel.

Reported-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-04-15 11:51:06 +03:00
David Gibson 1a87228f5f virtio_balloon: Fix endian bug
Although virtio config space fields are usually in guest-native endian,
the spec for the virtio balloon device explicitly states that both fields
in its config space are little-endian.

However, the current virtio_balloon driver does not have a suitable endian
swap for the 'num_pages' field, although it does have one for the 'actual'
field.  This patch corrects the bug, adding sparse annotation while we're
at it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-04-15 11:51:05 +03:00
Amit Shah f878d0be22 virtio-pci: switch to PM ops macro to initialise PM functions
Use the SET_SYSTEM_SLEEP_PM_OPS macro to initialise the suspend/resume
functions in the new PM API.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
2012-03-31 08:09:51 +05:30
Amit Shah 04c2322bee virtio-pci: S3 support
There's no difference in supporting S3 and S4 for virtio devices: the
vqs have to be re-created as the device has to be assumed to be reset at
restore-time.  Since S4 already handles this situation, we can directly
use the same code and callbacks for S3 support.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
2012-03-31 08:09:51 +05:30
Amit Shah 0517fdd156 virtio-pci: drop restore_common()
restore_common() was shared between restore and thaw callbacks.  With
thaw gone, we don't need restore_common() anymore.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
2012-03-31 08:09:51 +05:30
Amit Shah f38f8387cb virtio: drop thaw PM operation
The thaw operation was used by the balloon driver, but after the last
commit there's no reason to have separate thaw and restore callbacks.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
2012-03-31 08:09:50 +05:30
Amit Shah e47d854e57 virtio: balloon: Allow stats update after restore from S4
There's no reason stats update after restore can't work.  If a host
requested for stats, and before servicing the request, the guest entered
S4, upon restore, the stats request can still be processed and sent off
to the host.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
2012-03-31 08:09:50 +05:30
David Howells 9ffc93f203 Remove all #inclusions of asm/system.h
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it.  Performed with the following command:

perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`

Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-28 18:30:03 +01:00
Amit Shah 4eb05d562e virtio: balloon: leak / fill balloon across S4
commit e562966dba added support for S4 to
the balloon driver.  The freeze function did nothing to free the pages,
since reclaiming the pages from the host to immediately give them back
(if S4 was successful) seemed wasteful.  Also, if S4 wasn't successful,
the guest would have to re-fill the balloon.  On restore, the pages were
supposed to be marked freed and the free page counters were incremented
to reflect the balloon was totally deflated.

However, this wasn't done right.  The pages that were earlier taken away
from the guest during a balloon inflation operation were just shown as
used pages after a successful restore from S4.  Just a fancy way of
leaking lots of memory.

Instead of trying that, just leak the balloon on freeze and fill it on
restore/thaw paths.  This works properly now.  The optimisation to not
leak can be added later on after a bit of refactoring of the code.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-03-01 09:28:41 +10:30
Jason Wang a72caae218 virtio: correct the memory barrier in virtqueue_kick_prepare()
Use virtio_mb() to make sure the available index to be exposed before
checking the the avail event. Otherwise we may get stale value of
avail event in guest and never kick the host after.

Note: this fixes a bug introduced by ee7cd8981e.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
2012-01-28 08:10:23 +10:30
Jason Wang 4dbc5d9f4f virtio: fix typos of memory barriers
Note: this fixes a bug introduced recently in
7b21e34fd1.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-28 08:10:22 +10:30
Amit Shah e562966dba virtio: balloon: Add freeze, restore handlers to support S4
Handling balloon hibernate / restore is tricky.  If the balloon was
inflated before going into the hibernation state, upon resume, the host
will not have any memory of that.  Any pages that were passed on to the
host earlier would most likely be invalid, and the host will have to
re-balloon to the previous value to get in the pre-hibernate state.

So the only sane thing for the guest to do here is to discard all the
pages that were put in the balloon.  When to discard the pages is the
next question.

One solution is to deflate the balloon just before writing the image to
the disk (in the freeze() PM callback).  However, asking for pages from
the host just to discard them immediately after seems wasteful of
resources.  Hence, it makes sense to do this by just fudging our
counters soon after wakeup.  This means we don't deflate the balloon
before sleep, and also don't put unnecessary pressure on the host.

This also helps in the thaw case: if the freeze fails for whatever
reason, the balloon should continue to remain in the inflated state.
This was tested by issuing 'swapoff -a' and trying to go into the S4
state.  That fails, and the balloon stays inflated, as expected.  Both
the host and the guest are happy.

Finally, in the restore() callback, we empty the list of pages that were
previously given off to the host, add the appropriate number of pages to
the totalram_pages counter, reset the num_pages counter to 0, and
all is fine.

As a last step, delete the vqs on the freeze callback to prepare for
hibernation, and re-create them in the restore and thaw callbacks to
resume normal operation.

The kthread doesn't race with any operations here, since it's frozen
before the freeze() call and is thawed after the thaw() and restore()
callbacks, so we're safe with that.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12 15:44:47 +10:30
Amit Shah be91c33dd1 virtio: balloon: Move vq initialization into separate function
The probe and PM restore functions will share this code.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12 15:44:46 +10:30
Amit Shah f0fe6f1150 virtio: pci: add PM notification handlers for restore, freeze, thaw, poweroff
Handle thaw, restore and freeze notifications from the PM core.  Expose
these to individual virtio drivers that can quiesce and resume vq
operations.  For drivers not implementing the thaw() method, use the
restore method instead.

These functions also save device-specific data so that the device can be
put in pre-suspend state after resume, and disable and enable the PCI
device in the freeze and resume functions, respectively.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12 15:44:44 +10:30
Amit Shah d077536386 virtio: pci: switch to new PM API
The older PM API doesn't have a way to get notifications on hibernate
events.  Switch to the newer one that gives us those notifications.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12 15:44:44 +10:30
Rusty Russell e93300b1af virtio: add debugging if driver doesn't kick.
Under the existing #ifdef DEBUG, check that they don't have more than
1/10 of a second between an add_buf() and a
virtqueue_notify()/virtqueue_kick_prepare() call.

We could get false positives on a really busy system, but good for
development.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12 15:44:43 +10:30
Rusty Russell ee7cd8981e virtio: expose added descriptors immediately.
A virtio driver does virtqueue_add_buf() multiple times before finally
calling virtqueue_kick(); previously we only exposed the added buffers
in the virtqueue_kick() call.  This means we don't need a memory
barrier in virtqueue_add_buf(), but it reduces concurrency as the
device (ie. host) can't see the buffers until the kick.

In the unusual (but now possible) case where a driver does add_buf()
and get_buf() without doing a kick, we do need to insert one before
our counter wraps.  Otherwise we could wrap num_added, and later on
not realize that we have passed the marker where we should have
kicked.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12 15:44:43 +10:30
Rusty Russell 3b720b8c86 virtio: avoid modulus operation.
Since we know vq->vring.num is a power of 2, modulus is lazy (it's asserted
in vring_new_virtqueue()).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12 15:44:43 +10:30
Rusty Russell 41f0377f73 virtio: support unlocked queue kick
Based on patch by Christoph for virtio_blk speedup:

	Split virtqueue_kick to be able to do the actual notification
	outside the lock protecting the virtqueue.  This patch was
	originally done by Stefan Hajnoczi, but I can't find the
	original one anymore and had to recreated it from memory.
	Pointers to the original or corrections for the commit message
	are welcome.

Stefan's patch was here:

	a6d06644e3
	http://www.spinics.net/lists/linux-virtualization/msg14616.html

Third time's the charm!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12 15:44:43 +10:30
Rusty Russell f96fde41f7 virtio: rename virtqueue_add_buf_gfp to virtqueue_add_buf
Remove wrapper functions. This makes the allocation type explicit in
all callers; I used GPF_KERNEL where it seemed obvious, left it at
GFP_ATOMIC otherwise.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2012-01-12 15:44:42 +10:30
Rusty Russell 5dfc17628d virtio: document functions better.
The old documentation is left over from when we used a structure with
strategy pointers.

And move the documentation to the C file as per kernel practice.
Though I disagree...

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2012-01-12 15:44:42 +10:30
Sasha Levin 1e214a5c1a virtio-balloon: Trivial cleanups
Trivial changes to remove forgotten junk, format comments, and correct names.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12 15:44:42 +10:30
Rusty Russell 7b21e34fd1 virtio: harsher barriers for rpmsg.
We were cheating with our barriers; using the smp ones rather than the
real device ones.  That was fine, until rpmsg came along, which is
used to talk to a real device (a non-SMP CPU).

Unfortunately, just putting back the real barriers (reverting
d57ed95d) causes a performance regression on virtio-pci.  In
particular, Amos reports netbench's TCP_RR over virtio_net CPU
utilization increased up to 35% while throughput went down by up to
14%.

By comparison, this branch is in the noise.

Reference: https://lkml.org/lkml/2011/12/11/22

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12 15:44:42 +10:30
David S. Miller b3613118eb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-02 13:49:21 -05:00
Michael S. Tsirkin e6af578c53 virtio-pci: make reset operation safer
virtio pci device reset actually just does an I/O
write, which in PCI is really posted, that is it
can complete on CPU before the device has received it.

Further, interrupts might have been pending on
another CPU, so device callback might get invoked after reset.

This conflicts with how drivers use reset, which is typically:
	reset
	unregister
a callback running after reset completed can race with
unregister, potentially leading to use after free bugs.

Fix by flushing out the write, and flushing pending interrupts.

This assumes that device is never reset from
its vq/config callbacks, or in parallel with being
added/removed, document this assumption.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-11-24 13:04:48 +10:30
Sasha Levin fe1a7fe2c4 virtio-mmio: Correct the name of the guest features selector
Guest features selector spelling mistake.

Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-11-24 13:04:47 +10:30
Heiko Carstens bd20817f73 virtio: add HAS_IOMEM dependency to MMIO platform bus driver
Fix this compile error on s390:

  CC [M]  drivers/virtio/virtio_mmio.o
drivers/virtio/virtio_mmio.c: In function 'vm_get_features':
drivers/virtio/virtio_mmio.c:107:2: error: implicit declaration of function 'writel'

Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-11-24 13:04:47 +10:30
David S. Miller efd0bf97de Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The forcedeth changes had a conflict with the conversion over
to atomic u64 statistics in net-next.

The libertas cfg.c code had a conflict with the bss reference
counting fix by John Linville in net-next.

Conflicts:
	drivers/net/ethernet/nvidia/forcedeth.c
	drivers/net/wireless/libertas/cfg.c
2011-11-21 13:50:33 -05:00
Rick Jones 66846048f5 enable virtio_net to return bus_info in ethtool -i consistent with emulated NICs
Add a new .bus_name to virtio_config_ops then modify virtio_net to
call through to it in an ethtool .get_drvinfo routine to report
bus_info in ethtool -i output which is consistent with other
emulated NICs and the output of lspci.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-16 17:26:46 -05:00
Michael S. Tsirkin 72103bd128 virtio-pci: fix use after free
Commit 31a3ddda16 introduced
a use after free in virtio-pci. The main issue is
that the release method signals removal of the virtio device,
while remove signals removal of the pci device.

For example, on driver removal or hot-unplug,
virtio_pci_release_dev is called before virtio_pci_remove.
We then might get a crash as virtio_pci_remove tries to use the
device freed by virtio_pci_release_dev.

We allocate/free all resources together with the
pci device, so we can leave the release method empty.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
2011-11-14 11:16:26 +10:30
Linus Torvalds 32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Pawel Moll edfd52e636 virtio: Add platform bus driver for memory mapped virtio device
This patch, based on virtio PCI driver, adds support for memory
mapped (platform) virtio device. This should allow environments
like qemu to use virtio-based block & network devices even on
platforms without PCI support.

One can define and register a platform device which resources
will describe memory mapped control registers and "mailbox"
interrupt. Such device can be also instantiated using the Device
Tree node with compatible property equal "virtio,mmio".

Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Michael S.Tsirkin <mst@redhat.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-11-02 11:41:01 +10:30
Krishna Kumar 005b20a8e0 virtio: Dont add "config" to list for !per_vq_vector
For the MSI but non-per_vq_vector case, the config/change vq
also gets added to the list of vqs that need to process the
MSI interrupt. This is not needed as config has it's own
handler (vp_config_changed). In any case, vring_interrupt()
finds nothing needs to be done on this vq.

I tested this patch by testing the "Fallback:" and "Finally
fall back" cases in vp_find_vqs(). Please review.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-11-02 11:41:01 +10:30
Paul Gortmaker b5a2c4f199 virtio: Add module.h to drivers/virtio users.
Up to now, the module.h header was as hard to keep out as
sunlight.  But we are cleaning that up.  Fix the virtio users
who simply expect module.h to be there in every C file.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:14 -04:00
Rick Jones 8f9f4668b3 Add ethtool -g support to virtio_net
Add support for reporting ring sizes via ethtool -g to the virtio_net
driver.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 02:07:21 -04:00
Ohad Ben-Cohen e72542191c virtio: expose for non-virtualization users too
virtio has been so far used only in the context of virtualization,
and the virtio Kconfig was sourced directly by the relevant arch
Kconfigs when VIRTUALIZATION was selected.

Now that we start using virtio for inter-processor communications,
we need to source the virtio Kconfig outside of the virtualization
scope too.

Moreover, some architectures might use virtio for both virtualization
and inter-processor communications, so directly sourcing virtio
might yield unexpected results due to conflicting selections.

The simple solution offered by this patch is to always source virtio's
Kconfig in drivers/Kconfig, and remove it from the appropriate arch
Kconfigs. Additionally, a virtio menu entry has been added so virtio
drivers don't show up in the general drivers menu.

This way anyone can use virtio, though it's arguably less accessible
(and neat!) for virtualization users now.

Note: some architectures (mips and sh) seem to have a VIRTUALIZATION
menu merely for sourcing virtio's Kconfig, so that menu is removed too.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-07-23 16:20:30 +09:30
Michael S. Tsirkin 7ab358c23c virtio: add api for delayed callbacks
Add an API that tells the other side that callbacks
should be delayed until a lot of work has been done.
Implement using the new event_idx feature.

Note: it might seem advantageous to let the drivers
ask for a callback after a specific capacity has
been reached. However, as a single head can
free many entries in the descriptor table,
we don't really have a clue about capacity
until get_buf is called. The API is the simplest
to implement at the moment, we'll see what kind of
hints drivers can pass when there's more than one
user of the feature.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-30 11:14:16 +09:30
Michael S. Tsirkin a5c262c5fd virtio_ring: support event idx feature
Support for the new event idx feature:
1. When enabling interrupts, publish the current avail index
   value to the host to get interrupts on the next update.
2. Use the new avail_event feature to reduce the number
   of exits from the guest.

Simple test with the simulator:

[virtio]# time ./virtio_test
spurious wakeus: 0x7

real    0m0.169s
user    0m0.140s
sys     0m0.019s
[virtio]# time ./virtio_test --no-event-idx
spurious wakeus: 0x11

real    0m0.649s
user    0m0.295s
sys     0m0.335s

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-30 11:14:15 +09:30
Dave Hansen bf50e69f63 virtio balloon: kill tell-host-first logic
The virtio balloon driver has a VIRTIO_BALLOON_F_MUST_TELL_HOST
feature bit.  Whenever the bit is set, the guest kernel must
always tell the host before we free pages back to the allocator.
Without this feature, we might free a page (and have another
user touch it) while the hypervisor is unprepared for it.

But, if the bit is _not_ set, we are under no obligation to
reverse the order; we're under no obligation to do _anything_.
As of now, qemu-kvm defines the bit, but doesn't set it.

This patch makes the "tell host first" logic the only case.  This
should make everybody happy, and reduce the amount of untested or
untestable code in the kernel.

This _also_ means that we don't have to preserve a pfn list
after the pages are freed, which should let us get rid of some
temporary storage (vb->pfns) eventually.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-30 11:14:13 +09:30
Amit Shah 31a3ddda16 virtio_pci: Prevent double-free of pci regions after device hot-unplug
In the case where a virtio-console port is in use (opened by a program)
and a virtio-console device is removed, the port is kept around but all
the virtio-related state is assumed to be gone.

When the port is finally released (close() called), we call
device_destroy() on the port's device.  This results in the parent
device's structures to be freed as well.  This includes the PCI regions
for the virtio-console PCI device.

Once this is done, however, virtio_pci_release_dev() kicks in, as the
last ref to the virtio device is now gone, and attempts to do

     pci_iounmap(pci_dev, vp_dev->ioaddr);
     pci_release_regions(pci_dev);
     pci_disable_device(pci_dev);

which results in a double-free warning.

Move the code that releases regions, etc., to the virtio_pci_remove()
function, and all that's now left in release_dev is the final freeing of
the vp_dev.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-04-21 22:57:00 +09:30
Amit Shah b3258ff1d6 virtio: Decrement avail idx on buffer detach
When detaching a buffer from a vq, the avail.idx value should be
decremented as well.

This was noticed by hot-unplugging a virtio console port and then
plugging in a new one on the same number (re-using the vqs which were
just 'disowned').  qemu reported

   'Guest moved used index from 0 to 256'

when any IO was attempted on the new port.

CC: stable@kernel.org
Reported-by: juzhang <juzhang@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-04-21 22:57:00 +09:30
Milton Miller 8b3bb3ecf1 virtio: remove virtio-pci root device
We sometimes need to map between the virtio device and
the given pci device. One such use is OS installer that
gets the boot pci device from BIOS and needs to
find the relevant block device. Since it can't,
installation fails.

Instead of creating a top-level devices/virtio-pci
directory, create each device under the corresponding
pci device node.  Symlinks to all virtio-pci
devices can be found under the pci driver link in
bus/pci/drivers/virtio-pci/devices, and all virtio
devices under drivers/bus/virtio/devices.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Tested-by: "Daniel P. Berrange" <berrange@redhat.com>
Cc: stable@kernel.org
2011-01-20 21:37:30 +10:30
Stephen Hemminger be6528b2e5 virtio: fix format of sysfs driver/vendor files
The sysfs files for virtio produce the wrong format and are missing
the required newline. The output for virtio bus vendor/device should
have the same format as the corresponding entries for PCI devices.

Although this technically changes the ABI for sysfs, these files were
broken to start with!

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-11-24 15:21:12 +10:30
Michael S. Tsirkin 7ae4b866f8 virtio: return correct capacity to users
We can't rely on indirect buffers for capacity
calculations because they need a memory allocation
which might fail.  In particular, virtio_net can get
into this situation under stress, and it drops packets
and performs badly.

So return the number of buffers we can guarantee users.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reported-By: Krishna Kumar2 <krkumar2@in.ibm.com>
2010-11-24 15:21:11 +10:30
Michael S. Tsirkin 1fe9b6fef1 virtio: fix oops on OOM
virtio ring was changed to return an error code on OOM,
but one caller was missed and still checks for vq->vring.num.
The fix is just to check for <0 error code.

Long term it might make sense to change goto add_head to
just return an error on oom instead, but let's apply
a minimal fix for 2.6.35.

Reported-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Chris Mason <chris.mason@oracle.com>
Cc: stable@kernel.org # .34.x
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-26 08:05:31 -07:00
Michael S. Tsirkin b03214d559 virtio-pci: disable msi at startup
virtio-pci resets the device at startup by writing to the status
register, but this does not clear the pci config space,
specifically msi enable status which affects register
layout.

This breaks things like kdump when they try to use e.g. virtio-blk.

Fix by forcing msi off at startup. Since pci.c already has
a routine to do this, we export and use it instead of duplicating code.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
2010-06-23 22:49:07 +09:30
Michael S. Tsirkin 686d363786 virtio: return ENOMEM on out of memory
add_buf returns ring size on out of memory,
this is not what devices expect.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org # .34.x
2010-06-23 22:49:06 +09:30
Linus Torvalds 1756ac3d3c Merge branch 'virtio' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* 'virtio' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (27 commits)
  drivers/char: Eliminate use after free
  virtio: console: Accept console size along with resize control message
  virtio: console: Store each console's size in the console structure
  virtio: console: Resize console port 0 on config intr only if multiport is off
  virtio: console: Add support for nonblocking write()s
  virtio: console: Rename wait_is_over() to will_read_block()
  virtio: console: Don't always create a port 0 if using multiport
  virtio: console: Use a control message to add ports
  virtio: console: Move code around for future patches
  virtio: console: Remove config work handler
  virtio: console: Don't call hvc_remove() on unplugging console ports
  virtio: console: Return -EPIPE to hvc_console if we lost the connection
  virtio: console: Let host know of port or device add failures
  virtio: console: Add a __send_control_msg() that can send messages without a valid port
  virtio: Revert "virtio: disable multiport console support."
  virtio: add_buf_gfp
  trans_virtio: use virtqueue_xxx wrappers
  virtio-rng: use virtqueue_xxx wrappers
  virtio_ring: remove a level of indirection
  virtio_net: use virtqueue_xxx wrappers
  ...

Fix up conflicts in drivers/net/virtio_net.c due to new virtqueue_xxx
wrappers changes conflicting with some other cleanups.
2010-05-21 17:22:52 -07:00
Michael S. Tsirkin bbd603efb4 virtio: add_buf_gfp
Add an add_buf variant that gets gfp parameter. Use that
to allocate indirect buffers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-05-19 22:15:46 +09:30
Michael S. Tsirkin 7c5e9ed0c8 virtio_ring: remove a level of indirection
We have a single virtqueue_ops implementation,
and it seems unlikely we'll get another one
at this point. So let's remove an unnecessary
level of indirection: it would be very easy to
re-add it if another implementation surfaces.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-05-19 22:15:43 +09:30
Michael S. Tsirkin 946cfe0e05 virtio_balloon: use virtqueue_xxx wrappers
Switch virtio_balloon to new virtqueue_xxx wrappers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-05-19 22:15:41 +09:30
Jiri Kosina 6c9468e9eb Merge branch 'master' into for-next 2010-04-23 02:08:44 +02:00
Balbir Singh 61fb06cc8e virtio: Fix GFP flags passed from the virtio balloon driver
The virtio balloon driver can dig into the reservation pools of the OS
to satisfy a balloon request.  This is not advisable and other balloon
drivers (drivers/xen/balloon.c) avoid this as well.

The patch also adds changes to avoid printing a warning if allocation
fails, since we retry after sometime anyway.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: kvm <kvm@vger.kernel.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-22 07:34:05 -07:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Thomas Weber 8839316121 Fix typos in comments
[Ss]ytem => [Ss]ystem
udpate => update
paramters => parameters
orginal => original

Signed-off-by: Thomas Weber <swirl@gmx.li>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-03-16 11:47:56 +01:00
Michael S. Tsirkin bc505f3739 virtio: set pci bus master enable bit
As all virtio devices perform DMA, we
must enable bus mastering for them to be
spec compliant.

This patch fixes hotplug of virtio devices
with Linux guests and qemu 0.11-0.12.

Tested-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2010-03-02 13:41:14 +02:00
Michael S. Tsirkin 3119815912 virtio: fix out of range array access
I have observed the following error on virtio-net module unload:

------------[ cut here ]------------
WARNING: at kernel/irq/manage.c:858 __free_irq+0xa0/0x14c()
Hardware name: Bochs
Trying to free already-free IRQ 0
Modules linked in: virtio_net(-) virtio_blk virtio_pci virtio_ring
virtio af_packet e1000 shpchp aacraid uhci_hcd ohci_hcd ehci_hcd [last
unloaded: scsi_wait_scan]
Pid: 1957, comm: rmmod Not tainted 2.6.33-rc8-vhost #24
Call Trace:
 [<ffffffff8103e195>] warn_slowpath_common+0x7c/0x94
 [<ffffffff8103e204>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff810a7a36>] ? __free_pages+0x5a/0x70
 [<ffffffff8107cc00>] __free_irq+0xa0/0x14c
 [<ffffffff8107cceb>] free_irq+0x3f/0x65
 [<ffffffffa0081424>] vp_del_vqs+0x81/0xb1 [virtio_pci]
 [<ffffffffa0091d29>] virtnet_remove+0xda/0x10b [virtio_net]
 [<ffffffffa0075200>] virtio_dev_remove+0x22/0x4a [virtio]
 [<ffffffff812709ee>] __device_release_driver+0x66/0xac
 [<ffffffff81270ab7>] driver_detach+0x83/0xa9
 [<ffffffff8126fc66>] bus_remove_driver+0x91/0xb4
 [<ffffffff81270fcf>] driver_unregister+0x6c/0x74
 [<ffffffffa0075418>] unregister_virtio_driver+0xe/0x10 [virtio]
 [<ffffffffa0091c4d>] fini+0x15/0x17 [virtio_net]
 [<ffffffff8106997b>] sys_delete_module+0x1c3/0x230
 [<ffffffff81007465>] ? old_ich_force_enable_hpet+0x117/0x164
 [<ffffffff813bb720>] ? do_page_fault+0x29c/0x2cc
 [<ffffffff81028e58>] sysenter_dispatch+0x7/0x27
---[ end trace 15e88e4c576cc62b ]---

The bug is in virtio-pci: we use msix_vector as array index to get irq
entry, but some vqs do not have a dedicated vector so this causes an out
of bounds access.  By chance, we seem to often get 0 value, which
results in this error.

Fix by verifying that vector is legal before using it as index.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Shirley Ma <xma@us.ibm.com>
Acked-by: Amit Shah <amit.shah@redhat.com>
2010-02-28 20:39:11 +02:00
Amit Shah 3b8706240e virtio: Initialize vq->data entries to NULL
vq operations depend on vq->data[i] being NULL to figure out if the vq
entry is in use (since the previous patch).

We have to initialize them to NULL to ensure we don't work with junk
data and trigger false BUG_ONs.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Shirley Ma <xma@us.ibm.com>
2010-02-24 14:22:29 +10:30
Shirley Ma c021eac414 virtio: Add ability to detach unused buffers from vrings
There's currently no way for a virtio driver to ask for unused
buffers, so it has to keep a list itself to reclaim them at shutdown.
This is redundant, since virtio_ring stores that information.  So
add a new hook to do this.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 14:22:27 +10:30
Michael S. Tsirkin d57ed95da4 virtio: use smp_XX barriers on SMP
virtio is communicating with a virtual "device" that actually runs on
another host processor. Thus SMP barriers can be used to control
memory access ordering.

Where possible, we should use SMP barriers which are more lightweight than
mandatory barriers, because mandatory barriers also control MMIO effects on
accesses through relaxed memory I/O windows (which virtio does not use)
(compare specifically smp_rmb and rmb on x86_64).

We can't just use smp_mb and friends though, because
we must force memory ordering even if guest is UP since host could be
running on another CPU, but SMP barriers are defined to barrier() in
that configuration. So, for UP fall back to mandatory barriers instead.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 14:22:25 +10:30
Rusty Russell 97a545ab6c virtio: remove bogus barriers from DEBUG version of virtio_ring.c
With DEBUG defined, we add an ->in_use flag to detect if the caller
invokes two virtio methods in parallel.  The barriers attempt to ensure
timely update of the ->in_use flag.

But they're voodoo: if we need these barriers it implies that the
calling code doesn't have sufficient synchronization to ensure the
code paths aren't invoked at the same time anyway, and we want to
detect it.

Also, adding barriers changes timing, so turning on debug has more
chance of hiding real problems.

Thanks to MST for drawing my attention to this code...

CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 14:22:24 +10:30
Rusty Russell 169c246a30 virtio: fix balloon without VIRTIO_BALLOON_F_STATS_VQ
When running under qemu-kvm-0.11.0:

	BUG: unable to handle kernel paging request at 56e58955
	...
	Process vballoon (pid: 1297, ti=c7976000 task=c70a6ca0 task.ti=c7
	...
	Call Trace:
	 [<c88253a3>] ? balloon+0x1b3/0x440 [virtio_balloon]
	 [<c041c2d7>] ? schedule+0x327/0x9d0
	 [<c88251f0>] ? balloon+0x0/0x440 [virtio_balloon]
	 [<c014a2d4>] ? kthread+0x74/0x80
	 [<c014a260>] ? kthread+0x0/0x80
	 [<c0103b36>] ? kernel_thread_helper+0x6/0x30

need_stats_update should be zero-initialized.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Adam Litke <agl@us.ibm.com>
2010-02-24 14:22:17 +10:30
Adam Litke 1f34c71afe virtio: Fix scheduling while atomic in virtio_balloon stats
This is a fix for my earlier patch: "virtio: Add memory statistics reporting to
the balloon driver (V4)".

I discovered that all_vm_events() can sleep and therefore stats collection
cannot be done in interrupt context.  One solution is to handle the interrupt
by noting that stats need to be collected and waking the existing vballoon
kthread which will complete the work via stats_handle_request().  Rusty, is
this a saner way of doing business?

There is one issue that I would like a broader opinion on.  In stats_request, I
update vb->need_stats_update and then wake up the kthread.  The kthread uses
vb->need_stats_update as a condition variable.  Do I need a memory barrier
between the update and wake_up to ensure that my kthread sees the correct
value?  My testing suggests that it is not needed but I would like some
confirmation from the experts.

Signed-off-by: Adam Litke <agl@us.ibm.com>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: Anthony Liguori <aliguori@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 14:22:14 +10:30
Adam Litke 9564e138b1 virtio: Add memory statistics reporting to the balloon driver (V4)
Changes since V3:
 - Do not do endian conversions as they will be done in the host
 - Report stats that reference a quantity of memory in bytes
 - Minor coding style updates

Changes since V2:
 - Increase stat field size to 64 bits
 - Report all sizes in kb (not pages)
 - Drop anon_pages stat and fix endianness conversion

Changes since V1:
 - Use a virtqueue instead of the device config space

When using ballooning to manage overcommitted memory on a host, a system for
guests to communicate their memory usage to the host can provide information
that will minimize the impact of ballooning on the guests.  The current method
employs a daemon running in each guest that communicates memory statistics to a
host daemon at a specified time interval.  The host daemon aggregates this
information and inflates and/or deflates balloons according to the level of
host memory pressure.  This approach is effective but overly complex since a
daemon must be installed inside each guest and coordinated to communicate with
the host.  A simpler approach is to collect memory statistics in the virtio
balloon driver and communicate them directly to the hypervisor.

This patch enables the guest-side support by adding stats collection and
reporting to the virtio balloon driver.

Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor fixes)
2010-02-24 14:22:08 +10:30
Jamie Lokier 1f08b833dd Add __devexit_p around reference to virtio_pci_remove
This is needed to compile with CONFIG_VIRTIO_PCI=y,
because virtio_pci_remove is marked __devexit.

Signed-off-by: Jamie Lokier <jamie@shareable.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 14:22:04 +10:30
Jeff Mahoney d817cd5255 virtio: fix section mismatch warnings
Fix fixes the following warnings by renaming the driver structures to be
suffixed with _driver.

WARNING: drivers/virtio/virtio_balloon.o(.data+0x88): Section mismatch in reference from the variable virtio_balloon to the function .devexit.text:virtballoon_remove()

WARNING: drivers/char/hw_random/virtio-rng.o(.data+0x88): Section mismatch in reference from the variable virtio_rng to the function .devexit.text:virtrng_remove()

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-16 12:15:39 -08:00
Michael S. Tsirkin 2d61ba9503 virtio: order used ring after used index read
On SMP guests, reads from the ring might bypass used index reads. This
causes guest crashes because host writes to used index to signal ring
data readiness.  Fix this by inserting rmb before used ring reads.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
2009-10-29 08:50:37 +10:30
Michael S. Tsirkin 0b22bd0ba0 virtio-pci: fix per-vq MSI-X request logic
Commit f68d24082e
in 2.6.32-rc1 broke requesting IRQs for per-VQ MSI-X vectors:
- vector number was used instead of the vector itself
- we try to request an IRQ for VQ which does not
  have a callback handler

This is a regression that causes warnings in kernel log,
potentially lower performance as we need to scan vq list,
and might cause system failure if the interrupt
requested is in fact needed by another system.

This was not noticed earlier because in most cases
we were falling back on shared interrupt for all vqs.

The warnings often look like this:

virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X
virtio-pci 0000:00:03.0: irq 27 for MSI/MSI-X
virtio-pci 0000:00:03.0: irq 28 for MSI/MSI-X
IRQ handler type mismatch for IRQ 1
current handler: i8042
Pid: 2400, comm: modprobe Tainted: G        W
2.6.32-rc3-11952-gf3ed8d8-dirty #1
Call Trace:
 [<ffffffff81072aed>] ? __setup_irq+0x299/0x304
 [<ffffffff81072ff3>] ? request_threaded_irq+0x144/0x1c1
 [<ffffffff813455af>] ? vring_interrupt+0x0/0x30
 [<ffffffff81346598>] ? vp_try_to_find_vqs+0x583/0x5c7
 [<ffffffffa0015188>] ? skb_recv_done+0x0/0x34 [virtio_net]
 [<ffffffff81346609>] ? vp_find_vqs+0x2d/0x83
 [<ffffffff81345d00>] ? vp_get+0x3c/0x4e
 [<ffffffffa0016373>] ? virtnet_probe+0x2f1/0x428 [virtio_net]
 [<ffffffffa0015188>] ? skb_recv_done+0x0/0x34 [virtio_net]
 [<ffffffffa00150d8>] ? skb_xmit_done+0x0/0x39 [virtio_net]
 [<ffffffff8110ab92>] ? sysfs_do_create_link+0xcb/0x116
 [<ffffffff81345cc2>] ? vp_get_status+0x14/0x16
 [<ffffffff81345464>] ? virtio_dev_probe+0xa9/0xc8
 [<ffffffff8122b11c>] ? driver_probe_device+0x8d/0x128
 [<ffffffff8122b206>] ? __driver_attach+0x4f/0x6f
 [<ffffffff8122b1b7>] ? __driver_attach+0x0/0x6f
 [<ffffffff8122a9f9>] ? bus_for_each_dev+0x43/0x74
 [<ffffffff8122a374>] ? bus_add_driver+0xea/0x22d
 [<ffffffff8122b4a3>] ? driver_register+0xa7/0x111
 [<ffffffffa001a000>] ? init+0x0/0xc [virtio_net]
 [<ffffffff81009051>] ? do_one_initcall+0x50/0x148
 [<ffffffff8106e117>] ? sys_init_module+0xc5/0x21a
 [<ffffffff8100af02>] ? system_call_fastpath+0x16/0x1b
virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X
virtio-pci 0000:00:03.0: irq 27 for MSI/MSI-X

Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Reported-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-29 08:50:36 +10:30
Uwe Kleine-König 1e65175c2c move virtballoon_remove to .devexit.text
The function virtballoon_remove is used only wrapped by __devexit_p so
define it using __devexit.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-22 16:39:31 +10:30
Christian Borntraeger e95646c3ec virtio: let header files include virtio_ids.h
Rusty,

commit 3ca4f5ca73
    virtio: add virtio IDs file
moved all device IDs into a single file. While the change itself is
a very good one, it can break userspace applications. For example
if a userspace tool wanted to get the ID of virtio_net it used to
include virtio_net.h. This does no longer work, since virtio_net.h
does not include virtio_ids.h.
This patch moves all "#include <linux/virtio_ids.h>" from the C
files into the header files, making the header files compatible with
the old ones.

In addition, this patch exports virtio_ids.h to userspace.

CC: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-22 16:39:28 +10:30
Fernando Luis Vazquez Cao 3ca4f5ca73 virtio: add virtio IDs file
Virtio IDs are spread all over the tree which makes assigning new IDs
bothersome. Putting them together should make the process less error-prone.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-23 22:26:32 +09:30
Rusty Russell 3c1b27d504 virtio: make add_buf return capacity remaining
This API change means that virtio_net can tell how much capacity
remains for buffers.  It's necessarily fuzzy, since
VIRTIO_RING_F_INDIRECT_DESC means we can fit any number of descriptors
in one, *if* we can kmalloc.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dinesh Subhraveti <dineshs@us.ibm.com>
2009-09-23 22:26:31 +09:30
Rusty Russell f68d24082e virtio_pci: minor MSI-X cleanups
1) Rename vp_request_vectors to vp_request_msix_vectors, and take
   non-MSI-X case out to caller.
2) Comment weird pci_enable_msix API
3) Rename vp_find_vq to setup_vq.
4) Fix spaces to tabs
5) Make nvectors calc internal to vp_try_to_find_vqs()
6) Rename vector to msix_vector for more clarity.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
2009-09-23 22:26:31 +09:30
Michael S. Tsirkin e969fed542 virtio: refactor find_vqs
This refactors find_vqs, making it more readable and robust, and fixing
two regressions from 2.6.30:
- double free_irq causing BUG_ON on device removal
- probe failure when vq can't be assigned to msi-x vector
  (reported on old host kernels)

Tested-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-07-30 16:03:45 +09:30
Michael S. Tsirkin f6c8250703 virtio: delete vq from list
This makes delete vq the reverse of find vq.
This is required to make it possible to retry find_vqs
after a failure, otherwise the list gets corrupted.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-07-30 16:03:44 +09:30
Michael S. Tsirkin ff52c3fc71 virtio: fix memory leak on device removal
Make vp_free_vectors do the reverse of vq_request_vectors.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-07-30 16:03:44 +09:30
Mark McLoughlin 4b892e6582 virtio-pci: correctly unregister root device on error
If pci_register_driver() fails we're incorrectly unregistering the root
device with device_unregister() rather than root_device_unregister().

Reported-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-07-17 21:47:47 +09:30
Christian Borntraeger e335385373 virtio: enhance id_matching for virtio drivers
This patch allows a virtio driver to use VIRTIO_DEV_ANY_ID for the
device id. This will be used by a test module that can be bound to
any virtio device.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:16:40 +09:30
Christian Borntraeger c89e80168b virtio: fix id_matching for virtio drivers
This bug never appeared, since all current virtio drivers use
VIRTIO_DEV_ANY_ID for the vendor field. If a real vendor would be used,
the check in virtio_id_match is wrong - it returns 0 if
id->vendor == dev->id.vendor.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:16:40 +09:30
Mark McLoughlin 9fa29b9df3 virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
Add a new feature flag for indirect ring entries. These are ring
entries which point to a table of buffer descriptors.

The idea here is to increase the ring capacity by allowing a larger
effective ring size whereby the ring size dictates the number of
requests that may be outstanding, rather than the size of those
requests.

This should be most effective in the case of block I/O where we can
potentially benefit by concurrently dispatching a large number of
large requests. Even in the simple case of single segment block
requests, this results in a threefold increase in ring capacity.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:16:39 +09:30
Rusty Russell a92892825a virtio: expose features in sysfs
Each device negotiates feature bits; expose these in sysfs to help
diagnostics and debugging.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:16:38 +09:30
Michael S. Tsirkin 82af8ce84e virtio_pci: optional MSI-X support
This implements optional MSI-X support in virtio_pci.
MSI-X is used whenever the host supports at least 2 MSI-X
vectors: 1 for configuration changes and 1 for virtqueues.
Per-virtqueue vectors are allocated if enough vectors
available.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ whitespace, style)
2009-06-12 22:16:37 +09:30
Michael S. Tsirkin 77cf524654 virtio_pci: split up vp_interrupt
This reorganizes virtio-pci code in vp_interrupt slightly, so that
it's easier to add per-vq MSI support on top.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:16:37 +09:30
Michael S. Tsirkin d2a7ddda9f virtio: find_vqs/del_vqs virtio operations
This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
and updates all drivers. This is needed for MSI support, because MSI
needs to know the total number of vectors upfront.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ lguest/9p compile fixes)
2009-06-12 22:16:36 +09:30
Rusty Russell 9499f5e7ed virtio: add names to virtqueue struct, mapping from devices to queues.
Add a linked list of all virtqueues for a virtio device: this helps for
debugging and is also needed for upcoming interface change.

Also, add a "name" field for clearer debug messages.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:16:36 +09:30
Rusty Russell ef688e151c virtio: meet virtio spec by finalizing features before using device
Virtio devices are supposed to negotiate features before they start using
the device, but the current code doesn't do this.  This is because the
driver's probe() function invariably has to add buffers to a virtqueue,
or probe the disk (virtio_blk).

This currently doesn't matter since no existing backend is strict about
the feature negotiation.  But it's possible to imagine a future feature
which completely changes how a device operates: in this case, we'd need
to acknowledge it before using the device.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:16:35 +09:30
Marcelo Tosatti 84a139a985 virtio: fix suspend when using virtio_balloon
Break out of wait_event_interruptible() if freezing has been requested,
in the vballoon thread. Without this change vballoon refuses to stop and
the system can't suspend.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
2009-04-19 23:14:01 +09:30
Rusty Russell c5f841f178 virtio: more neatening of virtio_ring macros.
Impact: cleanup

Roel Kluin drew attention to these macros with his patch: here I
neaten them a little further:
1) Add a comment on what START_USE and END_USE are checking,
2) Brackets around _vq in BAD_RING,
3) Neaten formatting for START_USE so it's less than 80 cols.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-30 21:55:23 +10:30
Roel Kluin 3a35ce7dce virtio: fix BAD_RING, START_US and END_USE macros
Impact: cleanup

fix BAD_RING, START_US and END_USE macros

When these macros aren't called with a variable named vq as first
argument, this would result in a build failure.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-30 21:55:22 +10:30
Mark McLoughlin 3fff0179e3 virtio-pci: do not oops on config change if driver not loaded
The host really shouldn't be notifying us of config changes
before the device status is VIRTIO_CONFIG_S_DRIVER or
VIRTIO_CONFIG_S_DRIVER_OK.

However, if we do happen to be interrupted while we're not
attached to a driver, we really shouldn't oops. Prevent
this simply by checking that device->driver is non-NULL
before trying to notify the driver of config changes.

Problem observed by doing a "set_link virtio.0 down" with
QEMU before the net driver had been loaded.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-02 19:17:56 -08:00
Mark McLoughlin 63d1255670 virtio: do not statically allocate root device
We shouldn't be statically allocating the root device object,
so dynamically allocate it using root_device_register()
instead.

Also avoids this warning from 'rmmod virtio_pci':

  Device 'virtio-pci' does not have a release() function, it is broken and must be fixed

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:34 -08:00
Mark McLoughlin 29f9f12ec7 virtio: add PCI device release() function
Add a release() function for virtio_pci devices so as to avoid:

  Device 'virtio0' does not have a release() function, it is broken and must be fixed

Move the code to free the resources associated with the device
from virtio_pci_remove() into this new function. virtio_pci_remove()
now merely unregisters the device which should cause the final
ref to be dropped and virtio_pci_release_dev() to be called.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-30 09:26:10 +10:30
Hollis Blanchard 1b4aa2faec virtio: avoid implicit use of Linux page size in balloon interface
Make the balloon interface always use 4K pages, and convert Linux pfns if
necessary. This patch assumes that Linux's PAGE_SHIFT will never be less than
12.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (modified)
2008-12-30 09:26:04 +10:30
Rusty Russell 87c7d57c17 virtio: hand virtio ring alignment as argument to vring_new_virtqueue
This allows each virtio user to hand in the alignment appropriate to
their virtio_ring structures.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
2008-12-30 09:26:03 +10:30
Rusty Russell 498af14783 virtio: Don't use PAGE_SIZE for vring alignment in virtio_pci.
That doesn't work for non-4k guests which are now appearing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-30 09:25:58 +10:30
Rusty Russell 480daab42c virtio: Don't use PAGE_SIZE in virtio_pci.c
The virtio PCI devices don't depend on the guest page size.  This matters
now PowerPC virtio is gaining ground (they like 64k pages).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-30 09:25:57 +10:30
Kay Sievers 99e0b6c8e3 virtio: struct device - replace bus_id with dev_name(), dev_set_name()
This patch is part of a larger patch series which will remove
the "char bus_id[20]" name string from struct device. The device
name is managed in the kobject anyway, and without any size
limitation, and just needlessly copied into "struct device".

To set and read the device name dev_name(dev) and dev_set_name(dev)
must be used. If your code uses static kobjects, which it shouldn't
do, "const char *init_name" can be used to statically provide the
name the registered device should have. At registration time, the
init_name field is cleared, to enforce the use of dev_name(dev) to
access the device name at a later time.

We need to get rid of all occurrences of bus_id in the entire tree
to be able to enable the new interface. Please apply this patch,
and possibly convert any remaining remaining occurrences of bus_id.

We want to submit a patch to -next, which will remove bus_id from
"struct device", to find the remaining pieces to convert, and finally
switch over to the new api, which will remove the 20 bytes array
and does no longer have a size limitation.

Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-30 09:25:56 +10:30
Hollis Blanchard 13b1eb333b virtio-pci queue allocation not page-aligned
kzalloc() does not guarantee page alignment, and in fact this broke when
I enabled CONFIG_SLUB_DEBUG_ON.

(Thanks to Anthony Liguori for spotting the missing kfree sub)

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (fixed kfree)
Tested-by: Anthony Liguori <aliguori@us.ibm.com>
2008-12-30 09:25:56 +10:30
Anthony Liguori 532a6086e3 virtio_balloon: fix towards_target when deflating balloon
Both v and vb->num_pages are u32 and unsigned int respectively.  If v is less
than vb->num_pages (and it is, when deflating the balloon), the result is a
very large 32-bit number.  Since we're returning a s64, instead of getting the
same negative number we desire, we get a very large positive number.

This handles the case where v < vb->num_pages and ensures we get a small,
negative, s64 as the result.

Rusty: please push this for 2.6.27-rc4.  It's probably appropriate for the
stable tree too as it will cause an unexpected OOM when ballooning.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (simplified)
2008-08-26 00:19:25 +10:00
Rusty Russell e34f872567 virtio: Add transport feature handling stub for virtio_ring.
To prepare for virtio_ring transport feature bits, hook in a call in
all the users to manipulate them.  This currently just clears all the
bits, since it doesn't understand any features.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:14 +10:00
Rusty Russell c624896e48 virtio: Rename set_features to finalize_features
Rather than explicitly handing the features to the lower-level, we just
hand the virtio_device and have it set the features.  This make it clear
that it has the chance to manipulate the features of the device at this
point (and that all feature negotiation is already done).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:12 +10:00
Rusty Russell dd7c7bc462 virtio: Formally reserve bits 28-31 to be 'transport' features.
We assign feature bits as required, but it makes sense to reserve some
for the particular transport, rather than the particular device.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:07 +10:00
Mark McLoughlin e962fa660d virtio: Use bus_type probe and remove methods
Hook up to the probe() and remove() methods in bus_type
rather than device_driver. The latter has been preferred
since 2.6.16.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:05 +10:00
Rusty Russell 44653eae14 virtio: don't always force a notification when ring is full
We force notification when the ring is full, even if the host has
indicated it doesn't want to know.  This seemed like a good idea at
the time: if we fill the transmit ring, we should tell the host
immediately.

Unfortunately this logic also applies to the receiving ring, which is
refilled constantly.  We should introduce real notification thesholds
to replace this logic.  Meanwhile, removing the logic altogether breaks
the heuristics which KVM uses, so we use a hack: only notify if there are
outgoing parts of the new buffer.

Here are the number of exits with lguest's crappy network implementation:
Before:
	network xmit 7859051 recv 236420
After:
	network xmit 7858610 recv 118136

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:04 +10:00
Mark McLoughlin b92dea67cc virtio: Complete feature negotation before updating status
lguest (in rusty's use-tun-ringfd patch) assumes that the
guest has updated its feature bits before setting its status
to VIRTIO_CONFIG_S_DRIVER_OK.

That's pretty reasonable, so let's make it so.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-15 13:46:16 -07:00
Rusty Russell b4f68be6c5 virtio: force callback on empty.
virtio allows drivers to suppress callbacks (ie. interrupts) for
efficiency (no locking, it's just an optimization).

There's a similar mechanism for the host to suppress notifications
coming from the guest: in that case, we ignore the suppression if the
ring is completely full.

It turns out that life is simpler if the host similarly ignores
callback suppression when the ring is completely empty: the network
driver wants to free up old packets in a timely manner, and otherwise
has to use a timer to poll.

We have to remove the code which ignores interrupts when the driver
has disabled them (again, it had no locking and hence was unreliable
anyway).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-30 15:09:46 +10:00
Christian Borntraeger 52a3a05f3a virtio_net: another race with virtio_net and enable_cb
Hello Rusty,

seems that we still have a problem with virtio_net and the enable_cb callback.
During a long running network stress tests with virtio and got the following
oops:

------------[ cut here ]------------
kernel BUG at drivers/virtio/virtio_ring.c:230!
illegal operation: 0001 [#1] SMP
Modules linked in:
CPU: 0 Not tainted 2.6.26-rc2-kvm-00436-gc94c08b-dirty #34
Process netserver (pid: 2582, task: 000000000fbc4c68, ksp: 000000000f42b990)
Krnl PSW : 0704c00180000000 00000000002d0ec8 (vring_enable_cb+0x1c/0x60)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS: 0000000000000000 0000000000000000 000000000ef3d000 0000000010009800
           0000000000000000 0000000000419ce0 0000000000000080 000000000000007b
           000000000adb5538 000000000ef40900 000000000ef40000 000000000ef40920
           0000000000000000 0000000000000005 000000000029c1b0 000000000fea7d18
Krnl Code: 00000000002d0ebc: a7110001           tmll    %r1,1
           00000000002d0ec0: a7740004           brc     7,2d0ec8
           00000000002d0ec4: a7f40001           brc     15,2d0ec6
          >00000000002d0ec8: a517fffe           nill    %r1,65534
           00000000002d0ecc: 40103000           sth     %r1,0(%r3)
           00000000002d0ed0: 07f0               bcr     15,%r0
           00000000002d0ed2: e31020380004       lg      %r1,56(%r2)
           00000000002d0ed8: a7480000           lhi     %r4,0
Call Trace:
([<000000000029c0fc>] virtnet_poll+0x290/0x3b8)
 [<0000000000333fb8>] net_rx_action+0x9c/0x1b8
 [<00000000001394bc>] __do_softirq+0x74/0x108
 [<000000000010d16a>] do_softirq+0x92/0xac
 [<0000000000139826>] irq_exit+0x72/0xc8
 [<000000000010a7b6>] do_extint+0xe2/0x104
 [<0000000000110508>] ext_no_vtime+0x16/0x1a
Last Breaking-Event-Address:
 [<00000000002d0ec4>] vring_enable_cb+0x18/0x60

I looked into the virtio_net code for some time and I think the following
scenario happened. Please look at virtnet_poll:
[...]
        /* Out of packets? */
        if (received < budget) {
                netif_rx_complete(vi->dev, napi);
                if (unlikely(!vi->rvq->vq_ops->enable_cb(vi->rvq))
                    && napi_schedule_prep(napi)) {
                        vi->rvq->vq_ops->disable_cb(vi->rvq);
                        __netif_rx_schedule(vi->dev, napi);
                        goto again;
                }
        }

If an interrupt arrives after netif_rx_complete, a second poll routine can run
on a different cpu. The second check for napi_schedule_prep would prevent any
harm in the network stack, but we have called enable_cb possibly after the
disable_cb in skb_recv_done.

static void skb_recv_done(struct virtqueue *rvq)
{
        struct virtnet_info *vi = rvq->vdev->priv;
        /* Schedule NAPI, Suppress further interrupts if successful. */
        if (netif_rx_schedule_prep(vi->dev, &vi->napi)) {
                rvq->vq_ops->disable_cb(rvq);
                __netif_rx_schedule(vi->dev, &vi->napi);
        }
}

That means that the second poll routine runs with interrupts enabled, which is
ok, since we can handle additional interrupts. The problem is now that the
second poll routine might also call enable_cb, triggering the BUG.

The only solution I can come up with, is to remove the BUG statement in
enable_cb - similar to disable_cb. Opinions or better ideas where the oops
could come from?

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-30 15:09:45 +10:00
Rusty Russell b769f57908 virtio: set device index in common code.
Anthony Liguori points out that three different transports use the virtio code,
but each one keeps its own counter to set the virtio_device's index field.  In
theory (though not in current practice) this means that names could be
duplicated, and that risk grows as more transports are created.

So we move the selection of the unique virtio_device.index into the common code
in virtio.c, which has the side-benefit of removing duplicate code.

The only complexity is that lguest and S/390 use the index to uniquely identify
the device in case of catastrophic failure before register_virtio_device() is
called: now we use the offset within the descriptor page as a unique identifier
for the printks.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
2008-05-30 15:09:42 +10:00
Rusty Russell 5610bd1524 virtio: virtio_pci should not set bus_id.
The common virtio code sets the bus_id, overriding anything virtio_pci
sets anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
2008-05-30 15:09:42 +10:00
Rusty Russell 2ad3cfbac5 virtio: bus_id for devices should contain 'virtio'
Chris Lalancette <clalance@redhat.com> points out that virtio.c sets all device
names to '0', '1', etc, which looks silly in /proc/interrupts.  We change this
from '%d' to 'virtio%d'.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
2008-05-30 15:09:42 +10:00
Rusty Russell c45a6816c1 virtio: explicit advertisement of driver features
A recent proposed feature addition to the virtio block driver revealed
some flaws in the API: in particular, we assume that feature
negotiation is complete once a driver's probe function returns.

There is nothing in the API to require this, however, and even I
didn't notice when it was violated.

So instead, we require the driver to specify what features it supports
in a table, we can then move the feature negotiation into the virtio
core.  The intersection of device and driver features are presented in
a new 'features' bitmap in the struct virtio_device.

Note that this highlights the difference between Linux unsigned-long
bitmaps where each unsigned long is in native endian, and a
straight-forward little-endian array of bytes.

Drivers can still remove feature bits in their probe routine if they
really have to.

API changes:
- dev->config->feature() no longer gets and acks a feature.
- drivers should advertise their features in the 'feature_table' field
- use virtio_has_feature() for extra sanity when checking feature bits

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:50 +10:00
Rusty Russell 72e61eb40b virtio: change config to guest endian.
A recent proposed feature addition to the virtio block driver revealed
some flaws in the API, in particular how easy it is to break big
endian machines.

The virtio config space was originally chosen to be little-endian,
because we thought the config might be part of the PCI config space
for virtio_pci.  It's actually a separate mmio region, so that
argument holds little water; as only x86 is currently using the virtio
mechanism, we can change this (but must do so now, before the
impending s390 merge).

API changes:
- __virtio_config_val() just becomes a striaght vdev->config_get() call.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:50 +10:00
Harvey Harrison 597d56e4b5 virtio: fix sparse return void-valued expression warnings
drivers/virtio/virtio_pci.c:148:2: warning: returning void-valued expression
drivers/virtio/virtio_pci.c:155:2: warning: returning void-valued expression

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:44 +10:00
Rusty Russell 5ef827526f virtio: ignore corrupted virtqueues rather than spinning.
A corrupt virtqueue (caused by the other end screwing up) can have
strange results such as a driver spinning: just bail when we try to
get a buffer from a known-broken queue.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:43 +10:00
Rusty Russell 2557a933b7 virtio: remove overzealous BUG_ON.
The 'disable_cb' callback is designed as an optimization to tell the host
we don't need callbacks now.  As it is not reliable, the debug check is
overzealous: it can happen on two CPUs at the same time.  Document this.

Even if it were reliable, the virtio_net driver doesn't disable
callbacks on transmit so the START_USE/END_USE debugging reentrance
protection can be easily tripped even on UP.

Thanks to Balaji Rao for the bug report and testing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
CC: Balaji Rao <balajirrao@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-07 13:14:22 -07:00
Al Viro 97968358ab virtio_pci iomem annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-30 14:20:23 -07:00
Anthony Liguori bd6c26900b virtio_pci: unregister virtio device at device remove
Make sure to call unregister_virtio_device() when a virtio device is removed.
Otherwise, virtio_pci.ko cannot be rmmod'd.

This was spotted by Marcelo Tosatti.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-03-28 11:05:51 +11:00
Christian Borntraeger 4265f161b6 virtio: fix race in enable_cb
There is a race in virtio_net, dealing with disabling/enabling the callback.
I saw the following oops:

kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:218!
illegal operation: 0001 [#1] SMP
Modules linked in: sunrpc dm_mod
CPU: 2 Not tainted 2.6.25-rc1zlive-host-10623-gd358142-dirty #99
Process swapper (pid: 0, task: 000000000f85a610, ksp: 000000000f873c60)
Krnl PSW : 0404300180000000 00000000002b81a6 (vring_disable_cb+0x16/0x20)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3
Krnl GPRS: 0000000000000001 0000000000000001 0000000010005800 0000000000000001
           000000000f3a0900 000000000f85a610 0000000000000000 0000000000000000
           0000000000000000 000000000f870000 0000000000000000 0000000000001237
           000000000f3a0920 000000000010ff74 00000000002846f6 000000000fa0bcd8
Krnl Code: 00000000002b819a: a7110001           tmll    %r1,1
           00000000002b819e: a7840004           brc     8,2b81a6
           00000000002b81a2: a7f40001           brc     15,2b81a4
          >00000000002b81a6: a51b0001           oill    %r1,1
           00000000002b81aa: 40102000           sth     %r1,0(%r2)
           00000000002b81ae: 07fe               bcr     15,%r14
           00000000002b81b0: eb7ff0380024       stmg    %r7,%r15,56(%r15)
           00000000002b81b6: a7f13e00           tmll    %r15,15872
Call Trace:
([<000000000fa0bcd0>] 0xfa0bcd0)
 [<00000000002b8350>] vring_interrupt+0x5c/0x6c
 [<000000000010ab08>] do_extint+0xb8/0xf0
 [<0000000000110716>] ext_no_vtime+0x16/0x1a
 [<0000000000107e72>] cpu_idle+0x1c2/0x1e0

The problem can be triggered with a high amount of host->guest traffic.
I think its the following race:

poll says netif_rx_complete
poll calls enable_cb
enable_cb opens the interrupt mask
a new packet comes, an interrupt is triggered----\
enable_cb sees that there is more work           |
enable_cb disables the interrupt                 |
       .                                         V
       .                            interrupt is delivered
       .                            skb_recv_done does atomic napi test, ok
 some waiting                       disable_cb is called->check fails->bang!
       .
poll would do napi check
poll would do disable_cb

The fix is to let enable_cb not disable the interrupt again, but expect the
caller to do the cleanup if it returns false. In that case, the interrupt is
only disabled, if the napi test_set_bit was successful.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cleaned up doco)
2008-03-17 22:58:21 +11:00
Rusty Russell bdc1681cdf virtio: handle > 2 billion page balloon targets
If the host asks for a huge target towards_target() can overflow, and
we up oops as we try to release more pages than we have.  The simple
fix is to use a 64-bit value.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-03-17 22:58:19 +11:00
Anthony Liguori 27ebe308af virtio: Use spin_lock_irqsave/restore for virtio-pci
virtio-pci acquires its spin lock in an interrupt context so it's necessary
to use spin_lock_irqsave/restore variants.  This patch fixes guest SMP when
using virtio devices in KVM.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-03-17 22:58:13 +11:00
Johann Felix Soden 6659a0f0bb virtio: add missing #include <linux/delay.h>
Include linux/delay.h to fix compiler error:

drivers/virtio/virtio_balloon.c: In function 'fill_balloon':
drivers/virtio/virtio_balloon.c:98: error: implicit declaration of function 'msleep'

Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:21 -08:00
Rusty Russell 6b35e40767 virtio: balloon driver
After discussions with Anthony Liguori, it seems that the virtio
balloon can be made even simpler.  Here's my attempt.

The device configuration tells the driver how much memory it should
take from the guest (ie. balloon size).  The guest feeds the page
numbers it has taken via one virtqueue.

A second virtqueue feeds the page numbers the driver wants back: if
the device has the VIRTIO_BALLOON_F_MUST_TELL_HOST bit, then this
queue is compulsory, otherwise it's advisory (and the guest can simply
fault the pages back in).

This driver can be enhanced later to deflate the balloon via a
shrinker, oom callback or we could even go for a complete set of
in-guest regulators.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:13 +11:00
Anthony Liguori 55a7c06604 virtio: Use PCI revision field to indicate virtio PCI ABI version
As Avi pointed out, as we continue to massage the virtio PCI ABI, we can make
things a little more friendly to users by utilizing the PCI revision field to
indicate which version of the ABI we're using.  This is a hard ABI version
and incrementing it will cause the guest driver to break.

This is the necessary changes to virtio_pci to support this.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:12 +11:00
Anthony Liguori 3343660d8c virtio: PCI device
This is a PCI device that implements a transport for virtio.  It allows virtio
devices to be used by QEMU based VMMs like KVM or Xen.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:11 +11:00
Rusty Russell c6fd47011b virtio: Allow virtio to be modular and used by modules
This is needed for the virtio PCI device to be compiled as a module.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:06 +11:00
Rusty Russell 15f9c8903c virtio: Use the sg_phys convenience function.
Simple cleanup.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:05 +11:00
Rusty Russell 81a8deab1c virtio: handle interrupts after callbacks turned off
Anthony Liguori found double interrupt suppression in the virtio_net
driver, triggered by two skb_recv_done's in a row.  This is because
virtio_ring's interrupt suppression is a best-effort optimization: it
contains no synchronization so the host can miss it and still send
interrupts.

But it's certainly nicer for virtio users if calling disable_cb
actually disables callbacks, so we check for the race in the interrupt
routine.

Note: SMP guests might require syncronization here, but since
disable_cb is actually called from interrupt context, there has to be
some form of synchronization before the next same interrupt handler is
called (Linux guarantees that the same device's irq handler will never
run simultanously on multiple CPUs).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:04 +11:00
Rusty Russell 6e5aa7efb2 virtio: reset function
A reset function solves three problems:

1) It allows us to renegotiate features, eg. if we want to upgrade a
   guest driver without rebooting the guest.

2) It gives us a clean way of shutting down virtqueues: after a reset,
   we know that the buffers won't be used by the host, and

3) It helps the guest recover from messed-up drivers.

So we remove the ->shutdown hook, and the only way we now remove
feature bits is via reset.

We leave it to the driver to do the reset before it deletes queues:
the balloon driver, for example, needs to chat to the host in its
remove function.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:03 +11:00
Rusty Russell 426e3e0af5 virtio: clarify NO_NOTIFY flag usage
The other side (host) can set the NO_NOTIFY flag as an optimization,
to say "no need to kick me when you add things".  Make it clear that
this is advisory only; especially that we should always notify when
the ring is full.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:00 +11:00
Rusty Russell 18445c4d50 virtio: explicit enable_cb/disable_cb rather than callback return.
It seems that virtio_net wants to disable callbacks (interrupts) before
calling netif_rx_schedule(), so we can't use the return value to do so.

Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback
now returns void, rather than a boolean.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:49:58 +11:00
Rusty Russell a586d4f601 virtio: simplify config mechanism.
Previously we used a type/len pair within the config space, but this
seems overkill.  We now simply define a structure which represents the
layout in the config space: the config space can now only be extended
at the end.

The main driver-visible changes:
1) We indicate what fields are present with an explicit feature bit.
2) Virtqueues are explicitly numbered, and not in the config space.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:49:57 +11:00
Rusty Russell 74b2553f1d virtio: fix module/device unloading
The virtio code never hooked through the ->remove callback.  Although
noone supports device removal at the moment, this code is already
needed for module unloading.

This of course also revealed bugs in virtio_blk, virtio_net and lguest
unloading paths.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-11-19 11:20:42 +11:00
Rusty Russell 42b36cc0ce virtio: Force use of power-of-two for descriptor ring sizes
The virtio descriptor rings of size N-1 were nicely set up to be
aligned to an N-byte boundary.  But as Anthony Liguori points out, the
free-running indices used by virtio require that the sizes be a power
of 2, otherwise we get problems on wrap (demonstrated with lguest).

So we replace the clever "2^n-1" scheme with a simple "align to page
boundary" scheme: this means that all virtio rings take at least two
pages, but it's safer than guessing cache alignment.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-11-12 13:59:40 +11:00
Anthony Liguori 1bc4953ed4 virtio: Fix used_idx wrap-around
The more_used() function compares the vq->vring.used->idx with last_used_idx.
Since vq->vring.used->idx is a 16-bit integer, and last_used_idx is an
unsigned int, this results in unpredictable behavior when vq->vring.used->idx
wraps around.

This patch corrects this by changing last_used_idx to the correct type.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-11-12 13:59:09 +11:00
Rusty Russell 0a8a69dd77 Virtio helper routines for a descriptor ringbuffer implementation
These helper routines supply most of the virtqueue_ops for hypervisors
which want to use a ring for virtio.  Unlike the previous lguest
implementation:

1) The rings are variable sized (2^n-1 elements).
2) They have an unfortunate limit of 65535 bytes per sg element.
3) The page numbers are always 64 bit (PAE anyone?)
4) They no longer place used[] on a separate page, just a separate
   cacheline.
5) We do a modulo on a variable.  We could be tricky if we cared.
6) Interrupts and notifies are suppressed using flags within the rings.

Users need only get the ring pages and provide a notify hook (KVM
wants the guest to allocate the rings, lguest does it sanely).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dor Laor <dor.laor@qumranet.com>
2007-10-23 15:49:55 +10:00
Rusty Russell b01d9f2863 Module autoprobing support for virtio drivers.
This adds the logic to convert the virtio ids into module aliases, and
includes a modalias entry in sysfs and the env var to make probing work.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:55 +10:00
Rusty Russell ec3d41c4db Virtio interface
This attempts to implement a "virtual I/O" layer which should allow
common drivers to be efficiently used across most virtual I/O
mechanisms.  It will no-doubt need further enhancement.

The virtio drivers add buffers to virtio queues; as the buffers are consumed
the driver "interrupt" callbacks are invoked.

There is also a generic implementation of config space which drivers can query
to get setup information from the host.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dor Laor <dor.laor@qumranet.com>
Cc: Arnd Bergmann <arnd@arndb.de>
2007-10-23 15:49:54 +10:00