OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	0e6e58f941	One cc: stable commit, the rest are a series of minor cleanups which have been sitting in MST's tree during my vacation. I changed a function name and made one trivial change, then they spent two days in linux-next. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUQFBQAAoJENkgDmzRrbjxJRIP/1yCQRElQewxURSmJelyqCdU 0mHYB0R9Mf3tfre1xnofqs2lWeSMc/4ptKHsVR6pupoztSwnz7HsLHfEFvFJh4mj KsaqYElxkNxTcfyHwLjyJS0/J6tG1tYypXGiimTBS0bvFHL3XZdimVgJ6WvX+gO7 YSaDEX8/EqCERafslS5+gKJlz3drDOnCZCe9y4BDSmsvl2k7bkpSxIn8vsR6jIC0 c5JpUy6QVF+3XA/J932M7yRs+xpqxNoUWiyY3ar9o3CtQAaQB0ZAetSxY6hTfvVc GlNFzCifdsaQwsl2SVsE2h6tWaRhtMtcGWQuhHThIPyIf8XxhYyBRY2FLo70LMz1 eqtwy6F/Bg/nzUsdee4PZBMeoKHlAEL12RpsEKgfUoLzj16Aqa8ll+Agbglbkw8G f3d2FwzKAlpY5NwHETC1wYy52PJ3efqksRWuhokmYpxNSbHJS/lsiJOE7272/4Qr MtXuvRmo22tf34XFd5y7zqWjgZ58eeFOqQWi/K+6ZgpqVOvikjrXXKEuiVdjO0ZD kTVR/sQKiR+79rzENk80XBhWaMveECNXF1TiZ/3MmURkmEOBRQMxRQ20BX3exvna AJ/WVA5DcfXZc1yyqknE1NLGrvSBMJENH13x2QPwrqNWAryOOKuF1VKKIwWlDw5j vtx5nXiJa8YYdxI2TJCN =JK6x -----END PGP SIGNATURE----- Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio updates from Rusty Russell: "One cc: stable commit, the rest are a series of minor cleanups which have been sitting in MST's tree during my vacation. I changed a function name and made one trivial change, then they spent two days in linux-next" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits) virtio-rng: refactor probe error handling virtio_scsi: drop scan callback virtio_balloon: enable VQs early on restore virtio_scsi: fix race on device removal virito_scsi: use freezable WQ for events virtio_net: enable VQs early on restore virtio_console: enable VQs early on restore virtio_scsi: enable VQs early on restore virtio_blk: enable VQs early on restore virtio_scsi: move kick event out from virtscsi_init virtio_net: fix use after free on allocation failure 9p/trans_virtio: enable VQs early virtio_console: enable VQs early virtio_blk: enable VQs early virtio_net: enable VQs early virtio: add API to enable VQs early virtio_net: minor cleanup virtio-net: drop config_mutex virtio_net: drop config_enable virtio-blk: drop config_mutex ...	2014-10-18 10:25:09 -07:00
Michael S. Tsirkin	4b7fd2e688	virtio_net: fix use after free commit `0b725a2ca6` net: Remove ndo_xmit_flush netdev operation, use signalling instead. added code that looks at skb->xmit_more after the skb has been put in TX VQ. Since some paths process the ring and free the skb immediately, this can cause use after free. Fix by storing xmit_more in a local variable. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-15 16:47:45 -04:00
Michael S. Tsirkin	e53fbd11e9	virtio_net: enable VQs early on restore virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after restore returns, virtio net violated this rule by using receive VQs within restore. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-10-15 10:25:10 +10:30
Michael S. Tsirkin	0246555550	virtio_net: fix use after free on allocation failure In the extremely unlikely event that driver initialization fails after RX buffers are added, virtio net frees RX buffers while VQs are still active, potentially causing device to use a freed buffer. To fix, reset device first - same as we do on device removal. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-10-15 10:25:05 +10:30
Michael S. Tsirkin	4baf1e33d0	virtio_net: enable VQs early virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, virtio net violated this rule by using receive VQs within probe. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-10-15 10:25:02 +10:30
Michael S. Tsirkin	507613bf31	virtio_net: minor cleanup goto done; done: return; is ugly, it was put there to make diff review easier. replace by open-coded return. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-10-15 10:25:00 +10:30
Michael S. Tsirkin	080c637373	virtio-net: drop config_mutex config_mutex served two purposes: prevent multiple concurrent config change handlers, and synchronize access to config_enable flag. Since commit `dbf2576e37` workqueue: make all workqueues non-reentrant all workqueues are non-reentrant, and config_enable is now gone. Get rid of the unnecessary lock. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-10-15 10:24:59 +10:30
Michael S. Tsirkin	102a2786c9	virtio_net: drop config_enable Now that virtio core ensures config changes don't arrive during probing, drop config_enable flag in virtio net. On removal, flush is now sufficient to guarantee that no change work is queued. This help simplify the driver, and will allow setting DRIVER_OK earlier without losing config change notifications. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-10-15 10:24:58 +10:30
Rusty Russell	a58354409a	virtio_net: pass well-formed sgs to virtqueue_add_() This is the only driver which doesn't hand virtqueue_add_inbuf and virtqueue_add_outbuf a well-formed, well-terminated sg. Fix it, so we can make virtio_add_ simpler. pktgen results: modprobe pktgen echo 'add_device eth0' > /proc/net/pktgen/kpktgend_0 echo nowait 1 > /proc/net/pktgen/eth0 echo count 1000000 > /proc/net/pktgen/eth0 echo clone_skb 100000 > /proc/net/pktgen/eth0 echo dst_mac 4e:14:25:a9:30:ac > /proc/net/pktgen/eth0 echo dst 192.168.1.2 > /proc/net/pktgen/eth0 for i in `seq 20`; do echo start > /proc/net/pktgen/pgctrl; tail -n1 /proc/net/pktgen/eth0; done Before: 746547-793084(786421+/-9.6e+03)pps 346-367(364.4+/-4.4)Mb/sec (346397808-367990976(3.649e+08+/-4.5e+06)bps) errors: 0 After: 767390-792966(785159+/-6.5e+03)pps 356-367(363.75+/-2.9)Mb/sec (356068960-367936224(3.64314e+08+/-3e+06)bps) errors: 0 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 12:50:46 -04:00
David S. Miller	c89fcfd42c	virtio_net: flush when in xmit_more mode and under descriptor pressure Mirror the changes made to ixgbe in commit `2367a17390` ("ixgbe: flush when in xmit_more mode and under descriptor pressure") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-28 01:39:49 -07:00
David S. Miller	0b725a2ca6	net: Remove ndo_xmit_flush netdev operation, use signalling instead. As reported by Jesper Dangaard Brouer, for high packet rates the overhead of having another indirect call in the TX path is non-trivial. There is the indirect call itself, and then there is all of the reloading of the state to refetch the tail pointer value and then write the device register. Move to a more passive scheme, which requires very light modifications to the device drivers. The signal is a new skb->xmit_more value, if it is non-zero it means that more SKBs are pending to be transmitted on the same queue as the current SKB. And therefore, the driver may elide the tail pointer update. Right now skb->xmit_more is always zero. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-25 16:29:42 -07:00
David S. Miller	c223a078cb	virtio_net: Support netdev_ops->ndo_xmit_flush() Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 23:02:45 -07:00
Jason Wang	91815639d8	virtio-net: rx busy polling support Add basic support for rx busy polling. Instead of introducing new states and spinlock to synchronize between NAPI and polling method, this patch just reuse NAPI state to avoid extra overhead for fast path and simplified the codes. Test was done between a kvm guest and an external host. Two hosts were connected through 40gb mlx4 cards. With both busy_poll and busy_read are set to 50 in guest, 1 byte netperf tcp_rr shows 127% improvement: transaction rate was increased from 8353.33 to 18966.87. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-23 15:12:02 -07:00
Jason Wang	2ffa75988f	virtio-net: introduce virtnet_receive() Move common receive logic to a new helper virtnet_receive(). It will also be used by rx busy polling method. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-23 15:12:01 -07:00
Wilfried Klaebe	7ad24ea4bf	net: get rid of SET_ETHTOOL_OPS net: get rid of SET_ETHTOOL_OPS Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone. This does that. Mostly done via coccinelle script: @@ struct ethtool_ops ops; struct net_device dev; @@ - SET_ETHTOOL_OPS(dev, ops); + dev->ethtool_ops = ops; Compile tested only, but I'd seriously wonder if this broke anything. Suggested-by: Dave Miller <davem@davemloft.net> Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de> Acked-by: Felipe Balbi <balbi@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 17:43:20 -04:00
$Zhangjie $HZ$$ Zhangjie $HZ$	6ebbc1a638	virtio-net: Set needed_headroom for virtio-net when VIRTIO_F_ANY_LAYOUT is true This is a small supplement for commit `e7428e95a0` ("virtio-net: put virtio-net header inline with data"). TCP packages have enough room to put virtio-net header in, but UDP packages do not. By setting dev->needed_headroom for virtio-net device, UDP packages could have enough room. For UDP packages, sk_buff is alloced in fun __ip_append_data. The size is "alloclen + hh_len + 15", and "hh_len = LL_RESERVED_SPACE(rt-dst.dev);". The Macro is defined as follows: #define LL_RESERVED_SPACE(dev) \ ((((dev)->hard_header_len+(dev)->needed_headroom)\ &~(HH_DATA_MOD - 1)) + HH_DATA_MOD) By default, for UDP packages, after skb is allocated, only 16 bytes reserved. And 2 bytes remained after mac header is set. That is not enough to put virtio-net header in. If we set dev->needed_headroom to 12 or 10 (according to mergeable_rx_bufs is on or off ), more room can be reserved. Then there is enough room for UDP packages to put the header in. test result list as below: guest and host: suse11sp3, netperf, intel 2.4GHz +-------+---------+---------+---------+---------+ \| \| old \| new \| +-------+---------+---------+---------+---------+ \| UDP \| Gbit/s \| pps \| Gbit/s \| pps \| \| 64 \| 0.57 \| 692232 \| 0.61 \| 742420 \| \| 256 \| 1.60 \| 686860 \| 1.71 \| 733331 \| \| 512 \| 2.92 \| 674576 \| 3.07 \| 710446 \| \| 1024 \| 4.99 \| 598977 \| 5.17 \| 620821 \| \| 1460 \| 5.68 \| 483757 \| 7.16 \| 610519 \| \| 4096 \| 6.98 \| 637468 \| 7.21 \| 658471 \| +-------+---------+---------+---------+---------+ Signed-off-by: Zhang Jie <zhangjie14@huawei.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-30 13:31:26 -04:00
Amos Kong	c18e9cd623	virtio_net: zero is an invald queue_pairs number Execute "ethtool -L eth0 combined 0" in guest, if multiqueue is enabled, virtnet_send_command() will return -EINVAL error, there is a validation in QEMU. But if multiqueue is disabled, virtnet_set_queues() will just return zero (success). We should return error for this situation. Signed-off-by: Amos Kong <akong@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 16:01:35 -04:00
Linus Torvalds	cd6362befe	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "Here is my initial pull request for the networking subsystem during this merge window: 1) Support for ESN in AH (RFC 4302) from Fan Du. 2) Add full kernel doc for ethtool command structures, from Ben Hutchings. 3) Add BCM7xxx PHY driver, from Florian Fainelli. 4) Export computed TCP rate information in netlink socket dumps, from Eric Dumazet. 5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas Dichtel. 6) Convert many drivers to pci_enable_msix_range(), from Alexander Gordeev. 7) Record SKB timestamps more efficiently, from Eric Dumazet. 8) Switch to microsecond resolution for TCP round trip times, also from Eric Dumazet. 9) Clean up and fix 6lowpan fragmentation handling by making use of the existing inet_frag api for it's implementation. 10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss. 11) Auto size SKB lengths when composing netlink messages based upon past message sizes used, from Eric Dumazet. 12) qdisc dumps can take a long time, add a cond_resched(), From Eric Dumazet. 13) Sanitize netpoll core and drivers wrt. SKB handling semantics. Get rid of never-used-in-tree netpoll RX handling. From Eric W Biederman. 14) Support inter-address-family and namespace changing in VTI tunnel driver(s). From Steffen Klassert. 15) Add Altera TSE driver, from Vince Bridgers. 16) Optimizing csum_replace2() so that it doesn't adjust the checksum by checksumming the entire header, from Eric Dumazet. 17) Expand BPF internal implementation for faster interpreting, more direct translations into JIT'd code, and much cleaner uses of BPF filtering in non-socket ocntexts. From Daniel Borkmann and Alexei Starovoitov" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits) netpoll: Use skb_irq_freeable to make zap_completion_queue safe. net: Add a test to see if a skb is freeable in irq context qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port' net: ptp: move PTP classifier in its own file net: sxgbe: make "core_ops" static net: sxgbe: fix logical vs bitwise operation net: sxgbe: sxgbe_mdio_register() frees the bus Call efx_set_channels() before efx->type->dimension_resources() xen-netback: disable rogue vif in kthread context net/mlx4: Set proper build dependancy with vxlan be2net: fix build dependency on VxLAN mac802154: make csma/cca parameters per-wpan mac802154: allow only one WPAN to be up at any given time net: filter: minor: fix kdoc in __sk_run_filter netlink: don't compare the nul-termination in nla_strcmp can: c_can: Avoid led toggling for every packet. can: c_can: Simplify TX interrupt cleanup can: c_can: Store dlc private can: c_can: Reduce register access can: c_can: Make the code readable ...	2014-04-02 20:53:45 -07:00
Linus Torvalds	64056a9425	Nothing exciting: virtio-blk users might see a bit of a boost from the doubling of the default queue length though. Cheers, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJTOirvAAoJENkgDmzRrbjxpqAP/3114WOubf4MNoi24eXNkU2x ZC++orq408A72g8dB7A0XAhatKc8Ay0bDP5hOZNAgTHm2hjYxmhpC9UlKEzzElpW yjwR0wYBXA0RoJuZqCp9MNgOtkU54QoQZ0c4EZblagslUZBmKQPUDE7XdgkaqO6o A3azfCjAFDu523Azep9Npj1sk+H+VH3OIXyMGY+uyHBw1a2rnmIhn4lQCQ+pX+YO 3wpqxEragpjBAizs1CAAB9wWm2O8zVACxkoUXVYI8Tu60n99lwr9Abxlc0oHSIig E7kBnyzQVHfkDrPXR3EdwTi3Hwd/BaOiW4dPvQ3IJKvPOzoiS4H3IpJCnCV5PfRb VHl3q//SzPQ+GXH7WH2Fhb9JxoLxBRzFcy3kdIR1wBHYahAOiQjcLgapOO5mVq3X PJy9CDs2L9rjbtxvQWtnl62V3JFGw+ZdhhG/BjeC5Who/aSh/mDoss7/qdfrxKJx z5IWYSlJw7ighOuF0dPdCKAX9WiWqENvga31Q2svrH4Hxlx6eGumEmX+YQw0iRAf ddMYA+1qLT4myPTN0nORFM+T/mkZHNkNMCr0qylRFH0j6hSiDxwWqQG0eXA661By W6nIkW++sj2Vkk4knMGCXSyMmy9Nrv+1R+8unQJCXixYevotP5JEY0DoCQwlGuuq xa0UR+2q9Htnbytu8S0K =AoMS -----END PGP SIGNATURE----- Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio updates from Rusty Russell: "Nothing exciting: virtio-blk users might see a bit of a boost from the doubling of the default queue length though" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: virtio-blk: base queue-depth on virtqueue ringsize or module param Revert a02bbb1ccfe8: MAINTAINERS: add virtio-dev ML for virtio virtio: fail adding buffer on broken queues. virtio-rng: don't crash if virtqueue is broken. virtio_balloon: don't crash if virtqueue is broken. virtio_blk: don't crash, report error if virtqueue is broken. virtio_net: don't crash if virtqueue is broken. virtio_balloon: don't softlockup on huge balloon changes. virtio: Use pci_enable_msix_exact() instead of pci_enable_msix() MAINTAINERS: virtio-dev is subscribers only tools/virtio: add a missing ) tools/virtio: fix missing kmemleak_ignore symbol tools/virtio: update internal copies of headers	2014-04-02 14:43:17 -07:00
David S. Miller	64c27237a0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/marvell/mvneta.c The mvneta.c conflict is a case of overlapping changes, a conversion to devm_ioremap_resource() vs. a conversion to netdev_alloc_pcpu_stats. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-29 18:48:54 -04:00
Jason Wang	681daee244	virtio-net: correct error handling of virtqueue_kick() Current error handling of virtqueue_kick() was wrong in two places: - The skb were freed immediately when virtqueue_kick() fail during xmit. This may lead double free since the skb was not detached from the virtqueue. - try_fill_recv() returns false when virtqueue_kick() fail. This will lead unnecessary rescheduling of refill work. Actually, it's safe to just ignore the kick failure in those two places. So this patch fixes this by partially revert commit `6797590118`. Fixes `6797590118` (virtio_net: verify if virtqueue_kick() succeeded). Cc: Heinz Graalfs <graalfs@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:13:21 -04:00
Eric W. Biederman	85e9452539	virtio_net: Call dev_kfree_skb_any instead of dev_kfree_skb. Replace dev_kfree_skb with dev_kfree_skb_any in start_xmit which can be called in hard irq and other contexts. start_xmit only frees skbs that it is dropping. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-03-24 21:19:25 -07:00
Eric W. Biederman	57a7744e09	net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq Replace the bh safe variant with the hard irq safe variant. We need a hard irq safe variant to deal with netpoll transmitting packets from hard irq context, and we need it in most if not all of the places using the bh safe variant. Except on 32bit uni-processor the code is exactly the same so don't bother with a bh variant, just have a hard irq safe variant that everyone can use. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-14 22:41:36 -04:00
Rusty Russell	a7c58146cf	virtio_net: don't crash if virtqueue is broken. A bad implementation of virtio might cause us to mark the virtqueue broken: we'll dev_err() in that case, and the device is useless, but let's not BUG_ON(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2014-03-13 11:27:55 +10:30
Jason Wang	0e7ede80d9	virtio-net: alloc big buffers also when guest can receive UFO We should alloc big buffers also when guest can receive UFO packets to let the big packets fit into guest rx buffer. Fixes `5c5167515d` (virtio-net: Allow UFO feature to be set and advertised.) Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-24 18:19:52 -05:00
Michael Dalton	fbf28d78f5	virtio-net: initial rx sysfs support, export mergeable rx buffer size Add initial support for per-rx queue sysfs attributes to virtio-net. If mergeable packet buffers are enabled, adds a read-only mergeable packet buffer size sysfs attribute for each RX queue. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 23:46:07 -08:00
Michael Dalton	ab7db91705	virtio-net: auto-tune mergeable rx buffer size for improved performance Commit `2613af0ed1` ("virtio_net: migrate mergeable rx buffers to page frag allocators") changed the mergeable receive buffer size from PAGE_SIZE to MTU-size, introducing a single-stream regression for benchmarks with large average packet size. There is no single optimal buffer size for all workloads. For workloads with packet size <= MTU bytes, MTU + virtio-net header-sized buffers are preferred as larger buffers reduce the TCP window due to SKB truesize. However, single-stream workloads with large average packet sizes have higher throughput if larger (e.g., PAGE_SIZE) buffers are used. This commit auto-tunes the mergeable receiver buffer packet size by choosing the packet buffer size based on an EWMA of the recent packet sizes for the receive queue. Packet buffer sizes range from MTU_SIZE + virtio-net header len to PAGE_SIZE. This improves throughput for large packet workloads, as any workload with average packet size >= PAGE_SIZE will use PAGE_SIZE buffers. These optimizations interact positively with recent commit `ba27524103` ("virtio-net: coalesce rx frags when possible during rx"), which coalesces adjacent RX SKB fragments in virtio_net. The coalescing optimizations benefit buffers of any size. Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs between two QEMU VMs on a single physical machine. Each VM has two VCPUs with all offloads & vhost enabled. All VMs and vhost threads run in a single 4 CPU cgroup cpuset, using cgroups to ensure that other processes in the system will not be scheduled on the benchmark CPUs. Trunk includes SKB rx frag coalescing. net-next w/ virtio_net before `2613af0ed1` (PAGE_SIZE bufs): 14642.85Gb/s net-next (MTU-size bufs): 13170.01Gb/s net-next + auto-tune: 14555.94Gb/s Jason Wang also reported a throughput increase on mlx4 from 22Gb/s using MTU-sized buffers to about 26Gb/s using auto-tuning. Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 23:46:06 -08:00
Michael Dalton	fb51879dbc	virtio-net: use per-receive queue page frag alloc for mergeable bufs The virtio-net driver currently uses netdev_alloc_frag() for GFP_ATOMIC mergeable rx buffer allocations. This commit migrates virtio-net to use per-receive queue page frags for GFP_ATOMIC allocation. This change unifies mergeable rx buffer memory allocation, which now will use skb_refill_frag() for both atomic and GFP-WAIT buffer allocations. To address fragmentation concerns, if after buffer allocation there is too little space left in the page frag to allocate a subsequent buffer, the remaining space is added to the current allocated buffer so that the remaining space can be used to store packet data. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 23:46:06 -08:00
Jason Wang	be121f46af	virtio-net: drop rq->max and rq->num It looks like there's no need for those two fields: - Unless there's a failure for the first refill try, rq->max should be always equal to the vring size. - rq->num is only used to determine the condition that we need to do the refill, we could check vq->num_free instead. - rq->num was required to be increased or decreased explicitly after each get/put which results a bad API. So this patch removes them both to make the code simpler. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 17:30:42 -08:00
David S. Miller	56a4342dfe	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c net/ipv6/ip6_tunnel.c net/ipv6/ip6_vti.c ipv6 tunnel statistic bug fixes conflicting with consolidation into generic sw per-cpu net stats. qlogic conflict between queue counting bug fix and the addition of multiple MAC address support. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 17:37:45 -05:00
Jason Wang	6cd4ce0099	virtio-net: fix refill races during restore During restoring, try_fill_recv() was called with neither napi lock nor napi disabled. This can lead two try_fill_recv() was called in the same time. Fix this by refilling before trying to enable napi. Fixes `0741bcb558` (virtio: net: Add freeze, restore handlers to support S4). Cc: Amit Shah <amit.shah@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 19:23:03 -05:00
stephen hemminger	788a8b6dd3	virtio_net: spelling fixes Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-10 22:28:06 -05:00
stephen hemminger	d24bae32fa	virtio_net: remove unused parameter to send_command All the code passes NULL for the last sg list (in). Simplify by just removing it. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-10 22:28:06 -05:00
David S. Miller	34f9f43710	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Merge 'net' into 'net-next' to get the AF_PACKET bug fix that Daniel's direct transmit changes depend upon. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:20:14 -05:00
Michael Dalton	98bfd23cdb	virtio-net: free bufs correctly on invalid packet length When a packet with invalid length arrives, ensure that the packet is freed correctly if mergeable packet buffers and big packets (GUEST_TSO4) are both enabled. Signed-off-by: Michael Dalton <mwdalton@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-06 16:31:43 -05:00
Andrey Vagin	d4fb84eefe	virtio: delete napi structures from netdev before releasing memory free_netdev calls netif_napi_del too, but it's too late, because napi structures are placed on vi->rq. netif_napi_add() is called from virtnet_alloc_queues. general protection fault: 0000 [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables virtio_balloon pcspkr virtio_net(-) i2c_pii CPU: 1 PID: 347 Comm: rmmod Not tainted 3.13.0-rc2+ #171 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8800b779c420 ti: ffff8800379e0000 task.ti: ffff8800379e0000 RIP: 0010:[<ffffffff81322e19>] [<ffffffff81322e19>] __list_del_entry+0x29/0xd0 RSP: 0018:ffff8800379e1dd0 EFLAGS: 00010a83 RAX: 6b6b6b6b6b6b6b6b RBX: ffff8800379c2fd0 RCX: dead000000200200 RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000001 RDI: ffff8800379c2fd0 RBP: ffff8800379e1dd0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800379c2f90 R13: ffff880037839160 R14: 0000000000000000 R15: 00000000013352f0 FS: 00007f1400e34740(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f464124c763 CR3: 00000000b68cf000 CR4: 00000000000006e0 Stack: ffff8800379e1df0 ffffffff8155beab 6b6b6b6b6b6b6b2b ffff8800378391c0 ffff8800379e1e18 ffffffff8156499b ffff880037839be0 ffff880037839d20 ffff88003779d3f0 ffff8800379e1e38 ffffffffa003477c ffff88003779d388 Call Trace: [<ffffffff8155beab>] netif_napi_del+0x1b/0x80 [<ffffffff8156499b>] free_netdev+0x8b/0x110 [<ffffffffa003477c>] virtnet_remove+0x7c/0x90 [virtio_net] [<ffffffff813ae323>] virtio_dev_remove+0x23/0x80 [<ffffffff813f62ef>] __device_release_driver+0x7f/0xf0 [<ffffffff813f6ca0>] driver_detach+0xc0/0xd0 [<ffffffff813f5f28>] bus_remove_driver+0x58/0xd0 [<ffffffff813f72ec>] driver_unregister+0x2c/0x50 [<ffffffff813ae65e>] unregister_virtio_driver+0xe/0x10 [<ffffffffa0036942>] virtio_net_driver_exit+0x10/0x6ce [virtio_net] [<ffffffff810d7cf2>] SyS_delete_module+0x172/0x220 [<ffffffff810a732d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff810f5d4c>] ? __audit_syscall_entry+0x9c/0xf0 [<ffffffff81677f69>] system_call_fastpath+0x16/0x1b Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 RIP [<ffffffff81322e19>] __list_del_entry+0x29/0xd0 RSP <ffff8800379e1dd0> ---[ end trace d5931cd3f87c9763 ]--- Fixes: `986a4f4d45` (virtio_net: multiqueue support) Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-06 15:28:29 -05:00
Andrey Vagin	fa9fac1725	virtio-net: determine type of bufs correctly free_unused_bufs must check vi->mergeable_rx_bufs before vi->big_packets, because we use this sequence in other places. Otherwise we allocate buffer of one type, then free it as another type. general protection fault: 0000 [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables pcspkr virtio_balloon virtio_net(-) i2c_pii CPU: 0 PID: 400 Comm: rmmod Not tainted 3.13.0-rc2+ #170 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8800b6d2a210 ti: ffff8800aed32000 task.ti: ffff8800aed32000 RIP: 0010:[<ffffffffa00345f3>] [<ffffffffa00345f3>] free_unused_bufs+0xc3/0x190 [virtio_net] RSP: 0018:ffff8800aed33dd8 EFLAGS: 00010202 RAX: ffff8800b1fe2c00 RBX: ffff8800b66a7240 RCX: 6b6b6b6b6b6b6b6b RDX: 6b6b6b6b6b6b6b6b RSI: ffff8800b8419a68 RDI: ffff8800b66a1148 RBP: ffff8800aed33e00 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: ffff8800b66a1148 R14: 0000000000000000 R15: 000077ff80000000 FS: 00007fc4f9c4e740(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f63f432f000 CR3: 00000000b6538000 CR4: 00000000000006f0 Stack: ffff8800b66a7240 ffff8800b66a7380 ffff8800377bd3f0 0000000000000000 00000000023302f0 ffff8800aed33e18 ffffffffa00346e2 ffff8800b66a7240 ffff8800aed33e38 ffffffffa003474d ffff8800377bd388 ffff8800377bd390 Call Trace: [<ffffffffa00346e2>] remove_vq_common+0x22/0x40 [virtio_net] [<ffffffffa003474d>] virtnet_remove+0x4d/0x90 [virtio_net] [<ffffffff813ae303>] virtio_dev_remove+0x23/0x80 [<ffffffff813f62cf>] __device_release_driver+0x7f/0xf0 [<ffffffff813f6c80>] driver_detach+0xc0/0xd0 [<ffffffff813f5f08>] bus_remove_driver+0x58/0xd0 [<ffffffff813f72cc>] driver_unregister+0x2c/0x50 [<ffffffff813ae63e>] unregister_virtio_driver+0xe/0x10 [<ffffffffa0036852>] virtio_net_driver_exit+0x10/0x7be [virtio_net] [<ffffffff810d7cf2>] SyS_delete_module+0x172/0x220 [<ffffffff810a732d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff810f5d4c>] ? __audit_syscall_entry+0x9c/0xf0 [<ffffffff81677f69>] system_call_fastpath+0x16/0x1b Code: c0 74 55 0f 1f 44 00 00 80 7b 30 00 74 7a 48 8b 50 30 4c 89 e6 48 03 73 20 48 85 d2 0f 84 bb 00 00 00 66 0f RIP [<ffffffffa00345f3>] free_unused_bufs+0xc3/0x190 [virtio_net] RSP <ffff8800aed33dd8> ---[ end trace edb570ea923cce9c ]--- Fixes: `2613af0ed1` (virtio_net: migrate mergeable rx buffers to page frag allocators) Cc: Michael Dalton <mwdalton@google.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-06 15:28:29 -05:00
Jeff Kirsher	adf8d3ff6e	drivers/net/*: Fix FSF address in file headers Several files refer to an old address for the Free Software Foundation in the file header comment. Resolve by replacing the address with the URL <http://www.gnu.org/licenses/> so that we do not have to keep updating the header comments anytime the address changes. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Veaceslav Falico <vfalico@redhat.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Paul Mackerras <paulus@samba.org> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-06 12:37:55 -05:00
Michael S. Tsirkin	f121159d72	virtio_net: make all RX paths handle erors consistently receive mergeable now handles errors internally. Do same for big and small packet paths, otherwise the logic is too hard to follow. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-01 20:27:16 -05:00
Michael S. Tsirkin	8fc3b9e9a2	virtio_net: fix error handling for mergeable buffers Eric Dumazet noticed that if we encounter an error when processing a mergeable buffer, we don't dequeue all of the buffers from this packet, the result is almost sure to be loss of networking. Jason Wang noticed that we also leak a page and that we don't decrement the rq buf count, so we won't repost buffers (a resource leak). Fix both issues. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael Dalton <mwdalton@google.com> Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-01 20:27:16 -05:00
Thomas Huth	99e872ae1e	virtio_net: Fixed a trivial typo (fitler --> filter) "MAC filter" sounds more reasonable than "MAC fitler". Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-30 16:14:24 -05:00
Linus Torvalds	1ee2dcc224	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "Mostly these are fixes for fallout due to merge window changes, as well as cures for problems that have been with us for a much longer period of time" 1) Johannes Berg noticed two major deficiencies in our genetlink registration. Some genetlink protocols we passing in constant counts for their ops array rather than something like ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were using fixed IDs for their multicast groups. We have to retain these fixed IDs to keep existing userland tools working, but reserve them so that other multicast groups used by other protocols can not possibly conflict. In dealing with these two problems, we actually now use less state management for genetlink operations and multicast groups. 2) When configuring interface hardware timestamping, fix several drivers that simply do not validate that the hwtstamp_config value is one the driver actually supports. From Ben Hutchings. 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar. 4) In dev_forward_skb(), set the skb->protocol in the right order relative to skb_scrub_packet(). From Alexei Starovoitov. 5) Bridge erroneously fails to use the proper wrapper functions to make calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki Makita. 6) When detaching a bridge port, make sure to flush all VLAN IDs to prevent them from leaking, also from Toshiaki Makita. 7) Put in a compromise for TCP Small Queues so that deep queued devices that delay TX reclaim non-trivially don't have such a performance decrease. One particularly problematic area is 802.11 AMPDU in wireless. From Eric Dumazet. 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts here. Fix from Eric Dumzaet, reported by Dave Jones. 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn. 10) When computing mergeable buffer sizes, virtio-net fails to take the virtio-net header into account. From Michael Dalton. 11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic bumping, this one has been with us for a while. From Eric Dumazet. 12) Fix NULL deref in the new TIPC fragmentation handling, from Erik Hugne. 13) 6lowpan bit used for traffic classification was wrong, from Jukka Rissanen. 14) macvlan has the same issue as normal vlans did wrt. propagating LRO disabling down to the real device, fix it the same way. From Michal Kubecek. 15) CPSW driver needs to soft reset all slaves during suspend, from Daniel Mack. 16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet. 17) The xen-netfront RX buffer refill timer isn't properly scheduled on partial RX allocation success, from Ma JieYue. 18) When ipv6 ping protocol support was added, the AF_INET6 protocol initialization cleanup path on failure was borked a little. Fix from Vlad Yasevich. 19) If a socket disconnects during a read/recvmsg/recvfrom/etc that blocks we can do the wrong thing with the msg_name we write back to userspace. From Hannes Frederic Sowa. There is another fix in the works from Hannes which will prevent future problems of this nature. 20) Fix route leak in VTI tunnel transmit, from Fan Du. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits) genetlink: make multicast groups const, prevent abuse genetlink: pass family to functions using groups genetlink: add and use genl_set_err() genetlink: remove family pointer from genl_multicast_group genetlink: remove genl_unregister_mc_group() hsr: don't call genl_unregister_mc_group() quota/genetlink: use proper genetlink multicast APIs drop_monitor/genetlink: use proper genetlink multicast APIs genetlink: only pass array to genl_register_family_with_ops() tcp: don't update snd_nxt, when a socket is switched from repair mode atm: idt77252: fix dev refcnt leak xfrm: Release dst if this dst is improper for vti tunnel netlink: fix documentation typo in netlink_set_err() be2net: Delete secondary unicast MAC addresses during be_close be2net: Fix unconditional enabling of Rx interface options net, virtio_net: replace the magic value ping: prevent NULL pointer dereference on write to msg_name bnx2x: Prevent "timeout waiting for state X" bnx2x: prevent CFC attention bnx2x: Prevent panic during DMAE timeout ...	2013-11-19 15:50:47 -08:00
Zhi Yong Wu	0f13b66b01	net, virtio_net: replace the magic value It is more appropriate to use # of queue pairs currently used by the driver instead of a magic value. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-18 16:23:09 -05:00
Linus Torvalds	b746f9c794	Nothing really exciting: some groundwork for changing virtio endian, and some robustness fixes for broken virtio devices, plus minor tweaks. [vs last pull request: added the virtio-scsi broken vq escape patch, which I somehow lost.] Cheers, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJSgDsJAAoJENkgDmzRrbjxEE4P/jXqZHS/HdlxW9k0BjKKlEIF PdtCoP3UhWTdskXvy2pD8m6nYn214MEJYUIa4HFlIEZsdxhuexzQHY19Ynkjagyv 57sRsUUm5fYQLIL7IUh2DUD1VU38hUFinno/y333szzvCj9qITDA/QABsiWxK8NO dq+Lmeixgrhc5yN9iryW+gZV+hekJIZ4LsU5ejSaJucKblzXUH8qIbmSthG7RTYJ tr4J7xTTXbhxY4CoC5Dpx2hvsFkvzaAIvI4Nr1mDjfq5cR8BaYvnC89U1IbhdAey p1AbZE58JLrY+Z8K8LBRGV2KjO8qSZ6R47hbZ9nAnodJYB7sZLyj6jUe1q+/htuC Dh9Xm9O4eW2xNaFk20dYeIF4UU5/HzdsbvG/IlH8x4sm8/K706ocYyAOHlzYUg2T k7gltrgDzDokMgb2R44gwnr4oaJ2q8Gne6JXswlPEv2eRs6vNnA5Xhc0rEHGkU6C gYn1vNFN6yx0vf2syG/Ce5pZtMxGpefKQkHzzWdq8FKr1B9s54dDuf2hls7J8A9t OQT1gE33yURSelf4Kh4k9zWXaWk/Ohv9l2R1cqpALnJ4/+q0fP5t7HdK500S7aax DxLeFeqvsBw7nlWgsGxQmt+fjITQFHhcDiwst0ehnt6RbDEW7XPIguz0K/gyhxYG +UNbl/5Gr64jnUX3YCzm =vY2L -----END PGP SIGNATURE----- Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio updates from Rusty Russell: "Nothing really exciting: some groundwork for changing virtio endian, and some robustness fixes for broken virtio devices, plus minor tweaks" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: virtio_scsi: verify if queue is broken after virtqueue_get_buf() x86, asmlinkage, lguest: Pass in globals into assembler statement virtio: mmio: fix signature checking for BE guests virtio_ring: adapt to notify() returning bool virtio_net: verify if queue is broken after virtqueue_get_buf() virtio_console: verify if queue is broken after virtqueue_get_buf() virtio_blk: verify if queue is broken after virtqueue_get_buf() virtio_ring: add new function virtqueue_is_broken() virtio_test: verify if virtqueue_kick() succeeded virtio_net: verify if virtqueue_kick() succeeded virtio_ring: let virtqueue_{kick()/notify()} return a bool virtio_ring: change host notification API virtio_config: remove virtio_config_val virtio: use size-based config accessors. virtio_config: introduce size-based accessors. virtio_ring: plug kmemleak false positive. virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM	2013-11-15 13:28:47 +09:00
Michael Dalton	5061de3666	virtio-net: mergeable buffer size should include virtio-net header Commit `2613af0ed1` ("virtio_net: migrate mergeable rx buffers to page frag allocators") changed the mergeable receive buffer size from PAGE_SIZE to MTU-size. However, the merge buffer size does not take into account the size of the virtio-net header. Consequently, packets that are MTU-size will take two buffers intead of one (to store the virtio-net header), substantially decreasing the throughput of MTU-size traffic due to TCP window / SKB truesize effects. This commit changes the mergeable buffer size to include the virtio-net header. The buffer size is cacheline-aligned because skb_page_frag_refill will not automatically align the requested size. Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs between two QEMU VMs on a single physical machine. Each VM has two VCPUs and vhost enabled. All VMs and vhost threads run in a single 4 CPU cgroup cpuset, using cgroups to ensure that other processes in the system will not be scheduled on the benchmark CPUs. Transmit offloads and mergeable receive buffers are enabled, but guest_tso4 / guest_csum are explicitly disabled to force MTU-sized packets on the receiver. next-net trunk before `2613af0ed1` (PAGE_SIZE buf): 3861.08Gb/s net-next trunk (MTU 1500- packet uses two buf due to size bug): 4076.62Gb/s net-next trunk (MTU 1480- packet fits in one buf): 6301.34Gb/s net-next trunk w/ size fix (MTU 1500 - packet fits in one buf): 6445.44Gb/s Suggested-by: Eric Northup <digitaleric@google.com> Signed-off-by: Michael Dalton <mwdalton@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-14 17:22:56 -05:00
Linus Torvalds	5e30025a31	Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: "The biggest changes: - add lockdep support for seqcount/seqlocks structures, this unearthed both bugs and required extra annotation. - move the various kernel locking primitives to the new kernel/locking/ directory" * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) block: Use u64_stats_init() to initialize seqcounts locking/lockdep: Mark __lockdep_count_forward_deps() as static lockdep/proc: Fix lock-time avg computation locking/doc: Update references to kernel/mutex.c ipv6: Fix possible ipv6 seqlock deadlock cpuset: Fix potential deadlock w/ set_mems_allowed seqcount: Add lockdep functionality to seqcount/seqlock structures net: Explicitly initialize u64_stats_sync structures for lockdep locking: Move the percpu-rwsem code to kernel/locking/ locking: Move the lglocks code to kernel/locking/ locking: Move the rwsem code to kernel/locking/ locking: Move the rtmutex code to kernel/locking/ locking: Move the semaphore core to kernel/locking/ locking: Move the spinlock code to kernel/locking/ locking: Move the lockdep code to kernel/locking/ locking: Move the mutex code to kernel/locking/ hung_task debugging: Add tracepoint to report the hang x86/locking/kconfig: Update paravirt spinlock Kconfig description lockstat: Report avg wait and hold times lockdep, x86/alternatives: Drop ancient lockdep fixup message ...	2013-11-14 16:30:30 +09:00
John Stultz	827da44c61	net: Explicitly initialize u64_stats_sync structures for lockdep In order to enable lockdep on seqcount/seqlock structures, we must explicitly initialize any locks. The u64_stats_sync structure, uses a seqcount, and thus we need to introduce a u64_stats_init() function and use it to initialize the structure. This unfortunately adds a lot of fairly trivial initialization code to a number of drivers. But the benefit of ensuring correctness makes this worth while. Because these changes are required for lockdep to be enabled, and the changes are quite trivial, I've not yet split this patch out into 30-some separate patches, as I figured it would be better to get the various maintainers thoughts on how to best merge this change along with the seqcount lockdep enablement. Feedback would be appreciated! Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Jesse Gross <jesse@nicira.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mirko Lindner <mlindner@marvell.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Roger Luethi <rl@hellgate.ch> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Simon Horman <horms@verge.net.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-11-06 12:40:25 +01:00
Jason Wang	9bb8ca8607	virtio-net: switch to use XPS to choose txq We used to use a percpu structure vq_index to record the cpu to queue mapping, this is suboptimal since it duplicates the work of XPS and loses all other XPS functionality such as allowing user to configure their own transmission steering strategy. So this patch switches to use XPS and suggest a default mapping when the number of cpus is equal to the number of queues. With XPS support, there's no need for keeping per-cpu vq_index and .ndo_select_queue(), so they were removed also. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-05 22:20:29 -05:00
Jason Wang	ba27524103	virtio-net: coalesce rx frags when possible during rx Commit `2613af0ed1` (virtio_net: migrate mergeable rx buffers to page frag allocators) try to increase the payload/truesize for MTU-sized traffic. But this will introduce the extra overhead for GSO packets received because of the frag list. This commit tries to reduce this issue by coalesce the possible rx frags when possible during rx. Test result shows the about 15% improvement on full size GSO packet receiving (and even better than before commit `2613af0ed1`). Before this commit: ./netperf -H 192.168.100.4 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 20303.87 After this commit: ./netperf -H 192.168.100.4 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 23841.26 Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Michael Dalton <mwdalton@google.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-04 20:03:52 -05:00
David S. Miller	394efd19d5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/netconsole.c net/bridge/br_private.h Three mostly trivial conflicts. The net/bridge/br_private.h conflict was a function signature (argument addition) change overlapping with the extern removals from Joe Perches. In drivers/net/netconsole.c we had one change adjusting a printk message whilst another changed "printk(KERN_INFO" into "pr_info(". Lastly, the emulex change was a new inline function addition overlapping with Joe Perches's extern removals. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-04 13:48:30 -05:00

1 2 3 4 5 ...

258 Commits