OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Asias He	04b59babc0	tcm_vhost: Enable VIRTIO_SCSI_F_HOTPLUG Everything for hotplug is ready. Let's enable the feature bit. Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-04-25 01:07:58 -07:00
Asias He	11c6341839	tcm_vhost: Add ioctl to get and set events missed flag Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-04-25 01:07:45 -07:00
Asias He	a6c9af8736	tcm_vhost: Add hotplug/hotunplug support In commit `365a715009` ([SCSI] virtio-scsi: hotplug support for virtio-scsi), hotplug support is added to virtio-scsi. This patch adds hotplug and hotunplug support to tcm_vhost. You can create or delete a LUN in targetcli to hotplug or hotunplug a LUN in guest. Signed-off-by: Asias He <asias@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-04-25 01:06:41 -07:00
Asias He	f2b7daf5b1	tcm_vhost: Refactor the lock nesting rule We want to use tcm_vhost_mutex to make sure hotplug/hotunplug will not happen when set_endpoint/clear_endpoint is in process. Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-04-25 01:05:52 -07:00
David S. Miller	6e0895c2ea	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/emulex/benet/be_main.c drivers/net/ethernet/intel/igb/igb_main.c drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c include/net/scm.h net/batman-adv/routing.c net/ipv4/tcp_input.c The e{uid,gid} --> {uid,gid} credentials fix conflicted with the cleanup in net-next to now pass cred structs around. The be2net driver had a bug fix in 'net' that overlapped with the VLAN interface changes by Patrick McHardy in net-next. An IGB conflict existed because in 'net' the build_skb() support was reverted, and in 'net-next' there was a comment style fix within that code. Several batman-adv conflicts were resolved by making sure that all calls to batadv_is_my_mac() are changed to have a new bat_priv first argument. Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO rewrite in 'net-next', mostly overlapping changes. Thanks to Stephen Rothwell and Antonio Quartulli for help with several of these merge resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-22 20:32:51 -04:00
Linus Torvalds	bf81710c4b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target fixes from Nicholas Bellinger: "Here are remaining target-pending items for v3.9-rc7 code. The tcm_vhost patches are more than I'd usually include in a -rc7 pull, but are changes required for v3.9 to work correctly with the pending vhost-scsi-pci QEMU upstream series merge. (Paolo CC'ed) Plus Asias's conversion to use vhost_virtqueue->private_data + RCU for managing vhost-scsi endpoints has gotten alot of review + testing over the past weeks, and MST has ACKed the full series. Also, there is a target patch to fix a long-standing bug within control CDB handling with Standby/Offline/Transition ALUA port access states, that had been incorrectly rejecting the control CDBs required for LUN scan to work during these port group states. CC'ing to stable." * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target: Fix incorrect fallthrough of ALUA Standby/Offline/Transition CDBs tcm_vhost: Send bad target to guest when cmd fails tcm_vhost: Add vhost_scsi_send_bad_target() helper tcm_vhost: Fix tv_cmd leak in vhost_scsi_handle_vq tcm_vhost: Remove double check of response tcm_vhost: Initialize vq->last_used_idx when set endpoint tcm_vhost: Use vq->private_data to indicate if the endpoint is setup tcm_vhost: Use ACCESS_ONCE for vs->vs_tpg[target] access	2013-04-12 15:26:42 -07:00
Jason Wang	70181d5120	vhost_net: remove tx polling state After commit `2b8b328b61` (vhost_net: handle polling errors when setting backend), we in fact track the polling state through poll->wqh, so there's no need to duplicate the work with an extra vhost_net_polling_state. So this patch removes this and make the code simpler. This patch also removes the all tx starting/stopping code in tx path according to Michael's suggestion. Netperf test shows almost the same result in stream test, but gets improvements on TCP_RR tests (both zerocopy or copy) especially on low load cases. Tested between multiqueue kvm guest and external host with two direct connected 82599s. zerocopy disabled: sessions\|transaction rates\|normalize\| before/after/+improvements 1 \| 9510.24/11727.29/+23.3% \| 693.54/887.68/+28.0% \| 25\| 192931.50/241729.87/+25.3% \| 2376.80/2771.70/+16.6% \| 50\| 277634.64/291905.76/+5% \| 3118.36/3230.11/+3.6% \| zerocopy enabled: sessions\|transaction rates\|normalize\| before/after/+improvements 1 \| 7318.33/11929.76/+63.0% \| 521.86/843.30/+61.6% \| 25\| 167264.88/242422.15/+44.9% \| 2181.60/2788.16/+27.8% \| 50\| 272181.02/294347.04/+8.1% \| 3071.56/3257.85/+6.1% \| Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-11 16:16:22 -04:00
Asias He	055f648c49	tcm_vhost: Send bad target to guest when cmd fails Send bad target to guest in case: 1) we can not allocate the cmd 2) fail to submit the cmd Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-04-11 01:48:42 -07:00
Asias He	637ab21e28	tcm_vhost: Add vhost_scsi_send_bad_target() helper Share the send bad target code with other use cases. Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-04-11 01:48:35 -07:00
Asias He	7ea206cf3b	tcm_vhost: Fix tv_cmd leak in vhost_scsi_handle_vq If we fail to submit the allocated tv_vmd to tcm_vhost_submission_work, we will leak the tv_vmd. Free tv_vmd on fail path. Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-04-11 01:48:27 -07:00
Asias He	f6da51c3ef	tcm_vhost: Remove double check of response We did the length of response check twice. Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-04-11 01:48:11 -07:00
Asias He	dfd5d5692c	tcm_vhost: Initialize vq->last_used_idx when set endpoint This patch fixes guest hang when booting seabios and guest. [ 0.576238] scsi0 : Virtio SCSI HBA [ 0.616754] virtio_scsi virtio1: request:id 0 is not a head! vq->last_used_idx is initialized only when /dev/vhost-scsi is opened or closed. vhost_scsi_open -> vhost_dev_init() -> vhost_vq_reset() vhost_scsi_release() -> vhost_dev_cleanup -> vhost_vq_reset() So, when guest talks to tcm_vhost after seabios does, vq->last_used_idx still contains the old valule for seabios. This confuses guest. Fix this by calling vhost_init_used() to init vq->last_used_idx when we set endpoint. Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-04-08 14:09:54 -07:00
Asias He	4f7f46d32c	tcm_vhost: Use vq->private_data to indicate if the endpoint is setup Currently, vs->vs_endpoint is used indicate if the endpoint is setup or not. It is set or cleared in vhost_scsi_set_endpoint() or vhost_scsi_clear_endpoint() under the vs->dev.mutex lock. However, when we check it in vhost_scsi_handle_vq(), we ignored the lock. Instead of using the vs->vs_endpoint and the vs->dev.mutex lock to indicate the status of the endpoint, we use per virtqueue vq->private_data to indicate it. In this way, we can only take the vq->mutex lock which is per queue and make the concurrent multiqueue process having less lock contention. Further, in the read side of vq->private_data, we can even do not take the lock if it is accessed in the vhost worker thread, because it is protected by "vhost rcu". (nab: Do s/VHOST_FEATURES/~VHOST_SCSI_FEATURES) Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-04-08 14:07:00 -07:00
Asias He	af0d9187f6	tcm_vhost: Use ACCESS_ONCE for vs->vs_tpg[target] access In vhost_scsi_handle_vq: tv_tpg = vs->vs_tpg[target]; if (!tv_tpg) { .... return } tv_cmd = vhost_scsi_allocate_cmd(tv_tpg, &v_req, 1) vs->vs_tpg[target] might change after the NULL check and 2) the above line might access tv_tpg from vs->vs_tpg[target]. To prevent 2), use ACCESS_ONCE. Thanks mst for catching this up! Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-04-02 16:43:34 -07:00
Linus Torvalds	13d2080db3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target fixes from Nicholas Bellinger: "This includes the bug-fix for a >= v3.8-rc1 regression specific to iscsi-target persistent reservation conflict handling (CC'ed to stable), and a tcm_vhost patch to drop VIRTIO_RING_F_EVENT_IDX usage so that in-flight qemu vhost-scsi-pci device code can detect the proper vhost feature bits. Also, there are two more tcm_vhost patches still being discussed by MST and Asias for v3.9 that will be required for the in-flight qemu vhost-scsi-pci device patch to function properly, and that should (hopefully) be the last target fixes for this round." * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target: Fix RESERVATION_CONFLICT status regression for iscsi-target special case tcm_vhost: Avoid VIRTIO_RING_F_EVENT_IDX feature bit	2013-03-30 13:13:05 -07:00
Nicholas Bellinger	5dade71050	tcm_vhost: Avoid VIRTIO_RING_F_EVENT_IDX feature bit This patch adds a VHOST_SCSI_FEATURES mask minus VIRTIO_RING_F_EVENT_IDX so that vhost-scsi-pci userspace will strip this feature bit once GET_FEATURES reports it as being unsupported on the host. This is to avoid a bug where ->handle_kicks() are missed when EVENT_IDX is enabled by default in userspace code. (mst: Rename to VHOST_SCSI_FEATURES + add comment) Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Asias He <asias@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-03-28 23:42:47 -07:00
Linus Torvalds	a607a1143a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target fixes from Nicholas Bellinger: "These are mostly minor fixes this time around. The iscsi-target CHAP big-endian bugfix and bump FD_MAX_SECTORS=2048 default patch to allow 1MB sized I/Os for FILEIO backends on >= v3.5 code are both CC'ed to stable. Also, there is a persistent reservations regression that has recently been reported for >= v3.8.x code, that is currently being tracked down for v3.9." * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target/pscsi: Reject cross page boundary case in pscsi_map_sg target/file: Bump FD_MAX_SECTORS to 2048 to handle 1M sized I/Os tcm_vhost: Flush vhost_work in vhost_scsi_flush() tcm_vhost: Add missed lock in vhost_scsi_clear_endpoint() target: fix possible memory leak in core_tpg_register() target/iscsi: Fix mutual CHAP auth on big-endian arches target_core_sbc: use noop for SYNCHRONIZE_CACHE	2013-03-23 16:51:55 -07:00
Rusty Russell	f87d0fbb57	vringh: host-side implementation of virtio rings. Getting use of virtio rings correct is tricky, and a recent patch saw an implementation of in-kernel rings (as separate from userspace). This abstracts the business of dealing with the virtio ring layout from the access (userspace or direct); to do this, we use function pointers, which gcc inlines correctly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2013-03-20 14:05:33 +10:30
Michael S. Tsirkin	73640c991e	tools/virtio: fix build for 3.8 Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2013-03-20 14:00:41 +10:30
Asias He	72c77539fd	tcm_vhost: Flush vhost_work in vhost_scsi_flush() We also need to flush the vhost_works. It is the completion vhost_work currently. Signed-off-by: Asias He <asias@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-03-18 14:34:49 -07:00
Asias He	038e0af4a4	tcm_vhost: Add missed lock in vhost_scsi_clear_endpoint() tv_tpg->tv_tpg_vhost_count should be protected by tv_tpg->tv_tpg_mutex. Signed-off-by: Asias He <asias@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-03-18 14:34:35 -07:00
Michael S. Tsirkin	46aa92d1ba	vhost/net: fix heads usage of ubuf_info ubuf info allocator uses guest controlled head as an index, so a malicious guest could put the same head entry in the ring twice, and we will get two callbacks on the same value. To fix use upend_idx which is guaranteed to be unique. Reported-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-17 14:28:54 -04:00
Linus Torvalds	ecc88efbe7	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull scsi target updates from Nicholas Bellinger: "The highlights in this series include: - Improve sg_table lookup scalability in RAMDISK_MCP (martin) - Add device attribute to expose config name for INQUIRY model (tregaron) - Convert tcm_vhost to use lock-less list for cmd completion (asias) - Add tcm_vhost support for multiple target's per endpoint (asias) - Add tcm_vhost support for multiple queues per vhost (asias) - Add missing mapped_lun bounds checking during make_mappedlun setup in generic fabric configfs code (jan engelhardt + nab) - Enforce individual iscsi-target network portal export once per TargetName endpoint (grover + nab) - Add WRITE_SAME w/ UNMAP=0 emulation to FILEIO backend (nab) Things have been mostly quiet this round, with majority of the work being done on the iser-target WIP driver + associated iscsi-target refactoring patches currently in flight for v3.10 code. At this point there is one patch series left outstanding from Asias to add support for UNMAP + WRITE_SAME w/ UNMAP=1 to FILEIO awaiting feedback from hch & Co, that will likely be included in a post v3.9-rc1 PULL request if there are no objections. Also, there is a regression bug recently reported off-list that seems to be effecting v3.5 and v3.6 kernels with MSFT iSCSI initiators that is still being tracked down. No word if this effects >= v3.7 just yet, but if so there will likely another PULL request coming your way.." * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (26 commits) target: Rename spc_get_write_same_sectors -> sbc_get_write_same_sectors target/file: Add WRITE_SAME w/ UNMAP=0 emulation support iscsi-target: Enforce individual network portal export once per TargetName iscsi-target: Refactor iscsit_get_np sockaddr matching into iscsit_check_np_match target: Add missing mapped_lun bounds checking during make_mappedlun setup target: Fix lookup of dynamic NodeACLs during cached demo-mode operation target: Fix parameter list length checking in MODE SELECT target: Fix error checking for UNMAP commands target: Fix sense data for out-of-bounds IO operations target_core_rd: break out unterminated loop during copy tcm_vhost: Multi-queue support tcm_vhost: Multi-target support target: Add device attribute to expose config_item_name for INQUIRY model target: don't truncate the fail intr address target: don't always say "ipv6" as address type target/iblock: Use backend REQ_FLUSH hint for WriteCacheEnabled status iscsi-target: make some temporary buffers larger tcm_vhost: Optimize gup in vhost_scsi_map_to_sgl tcm_vhost: Use iov_num_pages to calculate sgl_count tcm_vhost: Introduce iov_num_pages ...	2013-02-26 11:42:23 -08:00
Linus Torvalds	06991c28f3	Driver core patches for 3.9-rc1 Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL If you need me to provide a merged tree to handle these resolutions, please let me know. Other than those patches, there's not much here, some minor fixes and updates. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9 weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW =yWAQ -----END PGP SIGNATURE----- Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg Kroah-Hartman: "Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL Other than those patches, there's not much here, some minor fixes and updates" Fix up trivial conflicts * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits) base: memory: fix soft/hard_offline_page permissions drivercore: Fix ordering between deferred_probe and exiting initcalls backlight: fix class_find_device() arguments TTY: mark tty_get_device call with the proper const values driver-core: constify data for class_find_device() firmware: Ignore abort check when no user-helper is used firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER firmware: Make user-mode helper optional firmware: Refactoring for splitting user-mode helper code Driver core: treat unregistered bus_types as having no devices watchdog: Convert to devm_ioremap_resource() thermal: Convert to devm_ioremap_resource() spi: Convert to devm_ioremap_resource() power: Convert to devm_ioremap_resource() mtd: Convert to devm_ioremap_resource() mmc: Convert to devm_ioremap_resource() mfd: Convert to devm_ioremap_resource() media: Convert to devm_ioremap_resource() iommu: Convert to devm_ioremap_resource() drm: Convert to devm_ioremap_resource() ...	2013-02-21 12:05:51 -08:00
Asias He	1b7f390eb3	tcm_vhost: Multi-queue support This adds virtio-scsi multi-queue support to tcm_vhost. In order to use multi-queue, guest side multi-queue support is need. It can be found here: https://lkml.org/lkml/2012/12/18/166 Currently, only one thread is created by vhost core code for each vhost_scsi instance. Even if there are multi-queues, all the handling of guest kick (vhost_scsi_handle_kick) are processed in one thread. This is not optimal. Luckily, most of the work is offloaded to the tcm_vhost workqueue. Some initial perf numbers: 1 queue, 4 targets, 1 lun per target 4K request size, 50% randread + 50% randwrite: 127K/127k IOPS 4 queues, 4 targets, 1 lun per target 4K request size, 50% randread + 50% randwrite: 181K/181k IOPS Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-02-13 11:30:14 -08:00
Asias He	67e18cf9ab	tcm_vhost: Multi-target support In order to take advantages of Paolo's multi-queue virito-scsi, we need multi-target support in tcm_vhost first. Otherwise all the requests go to one queue and other queues are idle. This patch makes: 1. All the targets under the wwpn is seen and can be used by guest. 2. No need to pass the tpgt number in struct vhost_scsi_target to tcm_vhost.ko. Only wwpn is needed. 3. We can always pass max_target = 255 to guest now, since we abort the request who's target id does not exist. Changes in v2: - Handle non-contiguous tpgt Changes in v3: - Simplfy lock in vhost_scsi_set_endpoint - Return -EEXIST when does not match Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-02-13 11:29:53 -08:00
Asias He	1810053e8d	tcm_vhost: Optimize gup in vhost_scsi_map_to_sgl We can get all the pages in one time instead of calling gup N times. Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-02-13 11:27:41 -08:00
Asias He	f3158f362c	tcm_vhost: Use iov_num_pages to calculate sgl_count Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-02-13 11:27:41 -08:00
Asias He	765b34fdc1	tcm_vhost: Introduce iov_num_pages Add a helper to calculate the number of pages needed for a iov entry. (nab: Drop unnecessary inline) Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-02-13 11:27:40 -08:00
Asias He	9d6064a347	tcm_vhost: Use llist for cmd completion list This drops the cmd completion list spin lock and makes the cmd completion queue lock-less. Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-02-13 11:27:31 -08:00
Linus Torvalds	e06b84052a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Revert iwlwifi reclaimed packet tracking, it causes problems for a bunch of folks. From Emmanuel Grumbach. 2) Work limiting code in brcmsmac wifi driver can clear tx status without processing the event. From Arend van Spriel. 3) rtlwifi USB driver processes wrong SKB, fix from Larry Finger. 4) l2tp tunnel delete can race with close, fix from Tom Parkin. 5) pktgen_add_device() failures are not checked at all, fix from Cong Wang. 6) Fix unintentional removal of carrier off from tun_detach(), otherwise we confuse userspace, from Michael S. Tsirkin. 7) Don't leak socket reference counts and ubufs in vhost-net driver, from Jason Wang. 8) vmxnet3 driver gets it's initial carrier state wrong, fix from Neil Horman. 9) Protect against USB networking devices which spam the host with 0 length frames, from Bjørn Mork. 10) Prevent neighbour overflows in ipv6 for locally destined routes, from Marcelo Ricardo. This is the best short-term fix for this, a longer term fix has been implemented in net-next. 11) L2TP uses ipv4 datagram routines in it's ipv6 code, whoops. This mistake is largely because the ipv6 functions don't even have some kind of prefix in their names to suggest they are ipv6 specific. From Tom Parkin. 12) Check SYN packet drops properly in tcp_rcv_fastopen_synack(), from Yuchung Cheng. 13) Fix races and TX skb freeing bugs in via-rhine's NAPI support, from Francois Romieu and your's truly. 14) Fix infinite loops and divides by zero in TCP congestion window handling, from Eric Dumazet, Neal Cardwell, and Ilpo Järvinen. 15) AF_PACKET tx ring handling can leak kernel memory to userspace, fix from Phil Sutter. 16) Fix error handling in ipv6 GRE tunnel transmit, from Tommi Rantala. 17) Protect XEN netback driver against hostile frontend putting garbage into the rings, don't leak pages in TX GOP checking, and add proper resource releasing in error path of xen_netbk_get_requests(). From Ian Campbell. 18) SCTP authentication keys should be cleared out and released with kzfree(), from Daniel Borkmann. 19) L2TP is a bit too clever trying to maintain skb->truesize, and ends up corrupting socket memory accounting to the point where packet sending is halted indefinitely. Just remove the adjustments entirely, they aren't really needed. From Eric Dumazet. 20) ATM Iphase driver uses a data type with the same name as the S390 headers, rename to fix the build. From Heiko Carstens. 21) Fix a typo in copying the inner network header offset from one SKB to another, from Pravin B Shelar. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits) net: sctp: sctp_endpoint_free: zero out secret key data net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree atm/iphase: rename fregt_t -> ffreg_t net: usb: fix regression from FLAG_NOARP code l2tp: dont play with skb->truesize net: sctp: sctp_auth_key_put: use kzfree instead of kfree netback: correct netbk_tx_err to handle wrap around. xen/netback: free already allocated memory on failure in xen_netbk_get_requests xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop. xen/netback: shutdown the ring if it contains garbage. net: qmi_wwan: add more Huawei devices, including E320 net: cdc_ncm: add another Huawei vendor specific device ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit() tcp: fix for zero packets_in_flight was too broad brcmsmac: rework of mac80211 .flush() callback operation ssb: unregister gpios before unloading ssb bcma: unregister gpios before unloading bcma rtlwifi: Fix scheduling while atomic bug net: usbnet: fix tx_dropped statistics tcp: ipv6: Update MIB counters for drops ...	2013-02-09 07:55:24 +11:00
Michael S. Tsirkin	71f1e45aa9	tcm_vhost: fix pr_err on early kick It's OK to get kick before backend is set or after it is cleared, we can just ignore it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-01-31 13:06:06 -08:00
Jason Wang	2b8b328b61	vhost_net: handle polling errors when setting backend Currently, the polling errors were ignored, which can lead following issues: - vhost remove itself unconditionally from waitqueue when stopping the poll, this may crash the kernel since the previous attempt of starting may fail to add itself to the waitqueue - userspace may think the backend were successfully set even when the polling failed. Solve this by: - check poll->wqh before trying to remove from waitqueue - report polling errors in vhost_poll_start(), tx_poll_start(), the return value will be checked and returned when userspace want to set the backend After this fix, there still could be a polling failure after backend is set, it will addressed by the next patch. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:43:03 -05:00
Jason Wang	692a998b90	vhost_net: correct error handling in vhost_net_set_backend() Currently, when vhost_init_used() fails the sock refcnt and ubufs were leaked. Correct this by calling vhost_init_used() before assign ubufs and restore the oldsock when it fails. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:43:03 -05:00
Kees Cook	43893cbefc	drivers/vhost: remove depends on CONFIG_EXPERIMENTAL The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-21 14:52:45 -08:00
Linus Torvalds	5bd665f28d	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull target updates from Nicholas Bellinger: "It has been a very busy development cycle this time around in target land, with the highlights including: - Kill struct se_subsystem_dev, in favor of direct se_device usage (hch) - Simplify reservations code by combining SPC-3 + SCSI-2 support for virtual backends only (hch) - Simplify ALUA code for virtual only backends, and remove left over abstractions (hch) - Pass sense_reason_t as return value for I/O submission path (hch) - Refactor MODE_SENSE emulation to allow for easier addition of new mode pages. (roland) - Add emulation of MODE_SELECT (roland) - Fix bug in handling of ExpStatSN wrap-around (steve) - Fix bug in TMR ABORT_TASK lookup in qla2xxx target (steve) - Add WRITE_SAME w/ UNMAP=0 support for IBLOCK backends (nab) - Convert ib_srpt to use modern target_submit_cmd caller + drop legacy ioctx->kref usage (nab) - Convert ib_srpt to use modern target_submit_tmr caller (nab) - Add link_magic for fabric allow_link destination target_items for symlinks within target_core_fabric_configfs.c code (nab) - Allocate pointers in instead of full structs for config_group->default_groups (sebastian) - Fix 32-bit highmem breakage for FILEIO (sebastian) All told, hch was able to shave off another ~1K LOC by killing the se_subsystem_dev abstraction, along with a number of PR + ALUA simplifications. Also, a nice patch by Roland is the refactoring of MODE_SENSE handling, along with the addition of initial MODE_SELECT emulation support for virtual backends. Sebastian found a long-standing issue wrt to allocation of full config_group instead of pointers for config_group->default_group[] setup in a number of areas, which ends up saving memory with big configurations. He also managed to fix another long-standing BUG wrt to broken 32-bit highmem support within the FILEIO backend driver. Thank you again to everyone who contributed this round!" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits) target/iscsi_target: Add NodeACL tags for initiator group support target/tcm_fc: fix the lockdep warning due to inconsistent lock state sbp-target: fix error path in sbp_make_tpg() sbp-target: use simple assignment in tgt_agent_rw_agent_state() iscsi-target: use kstrdup() for iscsi_param target/file: merge fd_do_readv() and fd_do_writev() target/file: Fix 32-bit highmem breakage for SGL -> iovec mapping target: Add link_magic for fabric allow_link destination target_items ib_srpt: Convert TMR path to target_submit_tmr ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref target: Make spc_get_write_same_sectors return sector_t target/configfs: use kmalloc() instead of kzalloc() for default groups target/configfs: allocate only 6 slots for dev_cg->default_groups target/configfs: allocate pointers instead of full struct for default_groups target: update error handling for sbc_setup_write_same() iscsit: use GFP_ATOMIC under spin lock iscsi_target: Remove redundant null check before kfree target/iblock: Forward declare bio helpers target: Clean up flow in transport_check_aborted_status() target: Clean up logic in transport_put_cmd() ...	2012-12-15 14:25:10 -08:00
Linus Torvalds	a2013a13e6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial branch from Jiri Kosina: "Usual stuff -- comment/printk typo fixes, documentation updates, dead code elimination." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) HOWTO: fix double words typo x86 mtrr: fix comment typo in mtrr_bp_init propagate name change to comments in kernel source doc: Update the name of profiling based on sysfs treewide: Fix typos in various drivers treewide: Fix typos in various Kconfig wireless: mwifiex: Fix typo in wireless/mwifiex driver messages: i2o: Fix typo in messages/i2o scripts/kernel-doc: check that non-void fcts describe their return value Kernel-doc: Convention: Use a "Return" section to describe return values radeon: Fix typo and copy/paste error in comments doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c various: Fix spelling of "asynchronous" in comments. Fix misspellings of "whether" in comments. eisa: Fix spelling of "asynchronous". various: Fix spelling of "registered" in comments. doc: fix quite a few typos within Documentation target: iscsi: fix comment typos in target/iscsi drivers treewide: fix typo of "suport" in various comments and Kconfig treewide: fix typo of "suppport" in various comments ...	2012-12-13 12:00:02 -08:00
Wei Yongjun	405d55c99d	tcm_vhost: remove unused variable in vhost_scsi_allocate_cmd() The variable se_sess is initialized but never used otherwise, so remove the unused variable. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-12-06 17:09:19 +02:00
Michael S. Tsirkin	f9611c43ab	vhost-net: enable zerocopy tx by default Zero copy TX has been around for a while now. We seem to be down to eliminating theoretical bugs and performance tuning at this point: it's probably time to enable it by default so that most users get the benefit. Keep the flag around meanwhile so users can experiment with disabling this if they experience regressions. I expect that we will remove it in the future. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-12-06 17:09:18 +02:00
Michael S. Tsirkin	cedb9bdce0	vhost-net: skip head management if no outstanding For short packets zerocopy mode adds overhead of managing heads which isn't necessary: we could simly update used ring directly same as with zerocopy disabled. Things seem to run a bit faster if we detect and bypass head management when zcopy isn't used. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-12-06 17:09:18 +02:00
Michael S. Tsirkin	1280c27f8e	vhost-net: flush outstanding DMAs on memory change When memory map changes, we need to flush outstanding DMAs as they might in theory reference old memory addresses. To do this simply stop initiating new DMAs and wait for ubufs ref count to drop to 0. Afterwards reset the count back to 1. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-12-06 17:09:18 +02:00
Michael S. Tsirkin	935cdee7ee	vhost: avoid backend flush on vring ops vring changes already do a flush internally where appropriate, so we do not need a second flush. It's currently not very expensive but a follow-up patch makes flush more heavy-weight, so remove the extra flush here to avoid regressing performance if call or kick fds are changed on data path. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-12-06 17:09:18 +02:00
Michael S. Tsirkin	64e9a9b8a0	vhost-net: initialize zcopy packet counters These packet counters are used to drive the zercopy selection heuristic so nothing too bad happens if they are off a bit - and they are also reset once in a while. But it's cleaner to clear them when backend is set so that we start in a known state. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-03 13:42:17 -05:00
David S. Miller	8a2cf062b2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-29 12:51:17 -05:00
Michael S. Tsirkin	bd97120fc3	vhost: fix length for cross region descriptor If a single descriptor crosses a region, the second chunk length should be decremented by size translated so far, instead it includes the full descriptor length. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-28 11:27:01 -05:00
Sachin Kamat	91aa76518d	vhost: Remove duplicate inclusion of linux/vhost.h linux/vhost.h was included twice. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-19 14:51:40 -05:00
Masanari Iida	744627e91c	treewide: fix printk typo in multiple drivers Correct spelling typo in multiple drivers. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-11-19 11:08:17 +01:00
David S. Miller	d4185bbf62	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c Minor conflict between the BCM_CNIC define removal in net-next and a bug fix added to net. Based upon a conflict resolution patch posted by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-10 18:32:51 -05:00
Christoph Hellwig	de103c93af	target: pass sense_reason as a return value Pass the sense reason as an explicit return value from the I/O submission path instead of storing it in struct se_cmd and using negative return values. This cleans up a lot of the code pathes, and with the sparse annotations for the new sense_reason_t type allows for much better error checking. (nab: Convert spc_emulate_modesense + spc_emulate_modeselect to use sense_reason_t with Roland's MODE SELECT changes) Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-11-06 20:55:46 -08:00
Michael S. Tsirkin	24eb21a148	vhost-net: reduce vq polling on tx zerocopy It seems that to avoid deadlocks it is enough to poll vq before we are going to use the last buffer. This is faster than `c70aa540c7`. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:58 -04:00
Michael S. Tsirkin	eaae8132ef	vhost-net: select tx zero copy dynamically Even when vhost-net is in zero-copy transmit mode, net core might still decide to copy the skb later which is somewhat slower than a copy in user context: data copy overhead is added to the cost of page pin/unpin. The result is that enabling tx zero copy option leads to higher CPU utilization for guest to guest and guest to host traffic. To fix this, suppress zero copy tx after a given number of packets triggered late data copy. Re-enable periodically to detect workload changes. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:58 -04:00
Michael S. Tsirkin	b211616d71	vhost: move -net specific code out Zerocopy handling code is vhost-net specific. Move it from vhost.c/vhost.h out to net.c Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:58 -04:00
Michael S. Tsirkin	c4fcb586c3	vhost: track zero copy failures using DMA length This will be used to disable zerocopy when error rate is high. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:57 -04:00
Michael S. Tsirkin	70e4cb9aaf	vhost-net: cleanup macros for DMA status tracking Better document macros for DMA tracking. Add an explicit one for DMA in progress instead of relying on user supplying len != 1. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:57 -04:00
Michael S. Tsirkin	e19d6763cc	skb: report completion status for zero copy skbs Even if skb is marked for zero copy, net core might still decide to copy it later which is somewhat slower than a copy in user context: besides copying the data we need to pin/unpin the pages. Add a parameter reporting such cases through zero copy callback: if this happens a lot, device can take this into account and switch to copying in user context. This patch updates all users but ignores the passed value for now: it will be used by follow-up patches. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:57 -04:00
Michael S. Tsirkin	910a578f7e	vhost: fix mergeable bufs on BE hosts We copy head count to a 16 bit field, this works by chance on LE but on BE guest gets 0. Fix it up. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Alexander Graf <agraf@suse.de> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-24 23:19:30 -04:00
Linus Torvalds	a188e7e93a	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull scsi target updates from Nicholas Bellinger: "Things have been calm for the most part with no new fabric drivers in flight for v3.7 (we're up to eight now !), so this update is primarily focused on addressing a few long-standing items within target-core and iscsi-target fabric code. The highlights include: - target: Simplify fabric sense data length handling (roland) - qla2xxx: Fix endianness of task management response code (roland) - target: fix truncation of mode data, support zero allocation length (paolo) - target: Properly support zero-length commands in normal processing path (paolo) - iscsi-target: Correctly set 0xffffffff field within ISCSI_OP_REJECT PDU (ronnie + nab) - iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG demo-mode (ronnie + nab) - target/file: Re-enable optional fd_buffered_io=1 operation (nab + hch) - iscsi-target: Add MaxXmitDataSegmenthLength forr target -> initiator MDRSL declaration (nab) - target: Add target_submit_cmd_map_sgls for SGL fabric memory passthrough (nab + hch) - tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls (hch + nab) - tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls (nab + hch) The last series for adding a new target_submit_cmd_map_sgls() fabric caller (as requested by hch) that accepts pre-allocated SGL memory (using existing logic), along with converting tcm_loop + tcm_vhost has only been in -next for the last days, but has gotten enough review +testing and is clear enough a mechanical change that I think it's reasonable to merge for -rc1 code. Thanks again to everyone who contributed this round! Extra special thanks to Roland (PureStorage) for tracking down the qla2xxx target TMR response code endian issue, and to Paolo (Redhat) for resolving the long standing zero-length CDB issues within target-core between virtual and pSCSI backends." * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (44 commits) iscsi-target: Bump defaults for nopin_timeout + nopin_response_timeout values iscsit: proper endianess conversions iscsit: use the itt_t abstract type iscsit: add missing endianess conversion in iscsit_check_inaddr_any iscsit: remove incorrect unlock in iscsit_build_sendtargets_resp iscsit: mark various functions static target/iscsi: precedence bug in iscsit_set_dataout_sequence_values() target/usb-gadget: strlen() doesn't count the terminator target/usb-gadget: remove duplicate initialization tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls target: Add control CDB READ payload zero work-around tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls target: Add target_submit_cmd_map_sgls for SGL fabric memory passthrough iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG demo-mode iscsi-target: Change iscsi_target_seq_pdu_list.c to honor MaxXmitDataSegmentLength iscsi-target: Add MaxXmitDataSegmentLength connection recovery check iscsi-target: Convert incoming PDU payload checks to MaxXmitDataSegmentLength iscsi-target: Enable MaxXmitDataSegmentLength operation in login path iscsi-target: Add base MaxXmitDataSegmentLength code target/file: Re-enable optional fd_buffered_io=1 operation ...	2012-10-10 19:52:19 +09:00
Nicholas Bellinger	9f0abc1554	tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls This patch converts tcm_vhost to use target_submit_cmd_map_sgls() for I/O submission and mapping of pre-allocated SGL memory from incoming virtio-scsi SGL memory -> se_cmd descriptors. This includes removing the original open-coded fabric uses of target core callers to support transport_generic_map_mem_to_cmd() between target_setup_cmd_from_cdb() and transport_handle_cdb_direct() logic. It also includes adding a handful of new tcm_vhost_cmnd member + assignments in vhost_scsi_allocate_cmd() used from cmwq process context I/O submission within tcm_vhost_submission_work() (v2: Use renamed target_submit_cmd_map_sgls) Reported-by: Christoph Hellwig <hch@lst.de> Cc: Christoph Hellwig <hch@lst.de> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-10-02 14:16:20 -07:00
Al Viro	cecb46f194	vhost_set_vring(): turn pollstart/pollstop into bool Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-26 21:10:12 -04:00
Roland Dreier	9c58b7ddd7	target: Simplify fabric sense data length handling Every fabric driver has to supply a se_tfo->set_fabric_sense_len() method, just so iSCSI can return an offset of 2. However, every fabric driver is already allocating a sense buffer and passing it into the target core, either via transport_init_se_cmd() or target_submit_cmd(). So instead of having iSCSI pass the start of its sense buffer into the core and then later tell the core to skip the first 2 bytes, it seems easier for iSCSI just to do the offset of 2 when it passes the sense buffer into the core. Then we can drop the se_tfo->set_fabric_sense_len() everywhere, and just add a couple of lines of code to iSCSI to set the sense data length to the beginning of the buffer right before it sends it over the network. (nab: Remove .set_fabric_sense_len usage from tcm_qla2xxx_npiv_ops + change transport_get_sense_buffer to follow v3.6-rc6 code w/o ->set_fabric_sense_len usage) Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-09-17 17:12:58 -07:00
Roland Dreier	2ed772b7b9	target: Remove unused target_core_fabric_ops.get_fabric_sense_len method There are no callers of se_tfo->get_fabric_sense_len(), so we should stop having every fabric driver implement it. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-09-17 16:15:47 -07:00
Michael S. Tsirkin	6de7145ca3	tcm_vhost: Fix vhost_scsi_target structure alignment Here TRANSPORT_IQN_LEN is 224, which is a multiple of 4. Since vhost_tpgt is 2 bytes and abi_version is 4, the total size would be 230. But gcc needs struct size be aligned to first field size, which is 4 bytes, so it pads the structure by extra 2 bytes to the total of 232. This padding is very undesirable in an ABI: - it can not be initialized easily - it can not be checked easily - it can leak information between kernel and userspace Simplest solution is probably just to make the padding explicit. (v2: Add check for zero'ed backend->reserved field for VHOST_SCSI_SET_ENDPOINT and VHOST_SCSI_CLEAR_ENDPOINT ops as requested by MST) Reported-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-08-20 14:52:11 -07:00
Nicholas Bellinger	5b7517f814	tcm_vhost: Change vhost_scsi_target->vhost_wwpn to char * This patch changes the vhost_scsi_target->vhost_wwpn[] type used by VHOST_SCSI_* ioctls to 'char *' as requested by Blue Swirl in order to match the latest QEMU vhost-scsi RFC-v3 userspace code. Queuing this up into target-pending/master for a -rc3 PULL. Reported-by: Blue Swirl <blauwirbel@gmail.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-08-16 19:03:13 -07:00
Nicholas Bellinger	101998f6fc	tcm_vhost: Post-merge review changes requested by MST This patch contains the post RFC-v5 (post-merge) changes, this includes: - Add locking comment - Move vhost_scsi_complete_cmd ahead of TFO callbacks in order to drop forward declarations - Drop extra '!= NULL' usage in vhost_scsi_complete_cmd_work() - Change vhost_scsi_*_handle_kick() to use pr_debug - Fix possible race in vhost_scsi_set_endpoint() for vs->vs_tpg checking + assignment. - Convert tv_tpg->tpg_vhost_count + ->tv_tpg_port_count from atomic_t -> int, and make sure reference is protected by ->tv_tpg_mutex. - Drop unnecessary vhost_scsi->vhost_ref_cnt - Add 'err:' label for exception path in vhost_scsi_clear_endpoint() - Add enum for VQ numbers, add usage in vhost_scsi_open() - Add vhost_scsi_flush() + vhost_scsi_flush_vq() following drivers/vhost/net.c - Add smp_wmb() + vhost_scsi_flush() call during vhost_scsi_set_features() - Drop unnecessary copy_from_user() usage with GET_ABI_VERSION ioctl - Add missing vhost_scsi_compat_ioctl() caller for vhost_scsi_fops - Fix function parameter definition first line to follow existing vhost code style - Change 'vHost' usage -> 'vhost' in handful of locations - Change -EPERM -> -EBUSY usage for two failures in tcm_vhost_drop_nexus() - Add comment for tcm_vhost_workqueue in tcm_vhost_init() - Make GET_ABI_VERSION return 'int' + add comment in tcm_vhost.h Reported-by: Michael S. Tsirkin <mst@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Zhi Yong Wu <wuzhy@cn.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-08-16 17:33:40 -07:00
Fengguang Wu	f0e0e9bba5	tcm_vhost: Fix incorrect IS_ERR() usage in vhost_scsi_map_iov_to_sgl Fix up a new coccinelle warnings reported by Fengguang Wu + Intel 0-DAY kernel build testing backend: drivers/vhost/tcm_vhost.c:537:23-29: ERROR: allocation function on line 533 returns NULL not ERR_PTR on failure vim +537 drivers/vhost/tcm_vhost.c 534 if (!sg) 535 return -ENOMEM; 536 pr_debug("%s sg %p sgl_count %u is_err %ld\n", __func__, > 537 sg, sgl_count, IS_ERR(sg)); 538 sg_init_table(sg, sgl_count); 539 540 tv_cmd->tvc_sgl = sg; Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-08-16 17:33:28 -07:00
Nicholas Bellinger	057cbf49a1	tcm_vhost: Initial merge for vhost level target fabric driver This patch adds the initial code for tcm_vhost, a Vhost level TCM fabric driver for virtio SCSI initiators into KVM guest. This code is currently up and running on v3.5-rc2 host+guest from target-pending/for-next-merge. Using tcm_vhost requires Zhi's -> Stefan -> nab's qemu vhost-scsi tree here: http://git.kernel.org/?p=virt/kvm/nab/qemu-kvm.git;a=shortlog;h=refs/heads/vhost-scsi -- Changelog v4 -> v5: Expose ABI version via VHOST_SCSI_GET_ABI_VERSION + use Rev 0 as starting point for v3.6-rc code (Stefan + ALiguori + nab) Convert vhost_scsi_handle_vq() to vq_err() (nab + MST) Minor style fixes from checkpatch (nab) Changelog v3 -> v4: Rename vhost_vring_target -> vhost_scsi_target (mst + nab) Use TRANSPORT_IQN_LEN in vhost_scsi_target->vhost_wwpn[] def (nab) Move back to drivers/vhost/, and just use drivers/vhost/Kconfig.tcm (mst) Move TCM_VHOST related ioctl defines from include/linux/vhost.h -> drivers/vhost/tcm_vhost.h as requested by MST (nab) Move Kbuild.tcm include from drivers/staging -> drivers/vhost/, and just use 'if STAGING' around 'source drivers/vhost/Kbuild.tcm' Changelog v2 -> v3: Unlock on error in tcm_vhost_drop_nexus() (DanC) Fix strlen() doesn't count the terminator (DanC) Call kfree() on an error path (DanC) Convert tcm_vhost_write_pending to use target_execute_cmd (hch + nab) Fix another strlen() off by one in tcm_vhost_make_tport (DanC) Add option under drivers/staging/Kconfig, and move to drivers/vhost/tcm/ as requested by MST (nab) Changelog v1 -> v2: Fix tv_cmd completion -> release SGL memory leak (nab) Fix sparse warnings for static variable usage ((Fengguang Wu) Fix sparse warnings for min() typing + printk format specs (Fengguang Wu) Convert to cmwq submission for I/O dispatch (nab + hch) Changelog v0 -> v1: Merge into single source + header file, and move to drivers/vhost/ Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Zhi Yong Wu <wuzhy@cn.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-07-29 13:49:10 -07:00
Stefan Hajnoczi	163049aefd	vhost: make vhost work queue visible The vhost work queue allows processing to be done in vhost worker thread context, which uses the owner process mm. Access to the vring and guest memory is typically only possible from vhost worker context so it is useful to allow work to be queued directly by users. Currently vhost_net only uses the poll wrappers which do not expose the work queue functions. However, for tcm_vhost (vhost_scsi) it will be necessary to queue custom work. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Cc: Zhi Yong Wu <wuzhy@cn.ibm.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-07-22 01:22:23 +03:00
Stefan Hajnoczi	0dd05a3b60	vhost: Separate vhost-net features from vhost features In order for other vhost devices to use the VHOST_FEATURES bits the vhost-net specific bits need to be moved to their own VHOST_NET_FEATURES constant. (Asias: Update drivers/vhost/test.c to use VHOST_NET_FEATURES) Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Cc: Zhi Yong Wu <wuzhy@cn.ibm.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Asias He <asias@redhat.com> Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-07-22 01:21:53 +03:00
Jens Freimann	d7ffde35e3	vhost: use USER_DS in vhost_worker thread On some architectures address spaces are set up in a way that this is not necessary to work properly but on some others (like s390) it is. Make sure we operate on the user address space to allow copy_xxx_user() from the vhost_worker() thread by setting it explicitly before calling use_mm() and revert it after unuse_mm(). Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-26 21:10:56 -07:00
David S. Miller	028940342a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-05-16 22:17:37 -04:00
Basil Gor	c53cff5e42	vhost-net: fix handle_rx buffer size Take vlan header length into account, when vlan id is stored as vlan_tci. Otherwise tagged packets coming from macvtap will be truncated. Signed-off-by: Basil Gor <basil.gor@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-11 18:16:57 -04:00
Jason Wang	c70aa540c7	vhost: zerocopy: poll vq in zerocopy callback We add used and signal guest in worker thread but did not poll the virtqueue during the zero copy callback. This may lead the missing of adding and signalling during zerocopy. Solve this by polling the virtqueue and let it wakeup the worker during callback. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-05-02 18:22:25 +03:00
Jason Wang	c8fb217af5	vhost_net: zerocopy: adding and signalling immediately when fully copied When a packet were fully copied in zerocopy, we don't wait for the DMA done to mark the done flag, so after the packet were passed to lower device, we need to add used and signal guest immediately. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-05-02 18:22:24 +03:00
Jason Wang	dbf34207c6	vhost_net: re-poll only on EAGAIN or ENOBUFS Currently, we restart tx polling unconditionally when sendmsg() fails. This would cause unnecessary wakeups of vhost wokers and waste cpu utlization when evil userspace(guest driver) is able to hit EFAULT or EINVAL. The polling is only needed when the socket send buffer were exceeded or not enough memory. So fix this by restarting polling only when sendmsg() returns EAGAIN/ENOBUFS. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-05-02 18:22:24 +03:00
Jason Wang	c460f05739	vhost_net: zerocopy: fix possible NULL pointer dereference of vq->bufs When we want to disable vhost_net backend while there's a tx work, a possible NULL pointer defernece may happen we we try to deference the vq->bufs after vhost_net_set_backend() assign a NULL to it. As suggested by Michael, fix this by checking the vq->bufs instead of vhost_sock_zcopy(). Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-05-02 18:22:22 +03:00
Linus Torvalds	7e29629543	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix namespace init and cleanup in phonet to fix some oopses, from Eric W. Biederman. 2) Missing kfree_skb() in AF_KEY, from Julia Lawall. 3) Refcount leak and source address handling fix in l2tp from James Chapman. 4) Memory leak fix in CAIF from Tomasz Gregorek. 5) When routes are cloned from ipv6 addrconf routes, we don't process expirations properly. Fix from Gao Feng. 6) Fix panic on DMA errors in atl1 driver, from Tony Zelenoff. 7) Only enable interrupts in 8139cp driver after we've registered the IRQ handler. From Jason Wang. 8) Fix too many reads of KS_CIDER register in ks8851 during probe, fixing crashes on spurious interrupts. From Matt Renzelmann. 9) Missing include in ath5k driver and missing iounmap on probe failure, from Jonathan Bither. 10) Fix RX packet handling in smsc911x driver, from Will Deacon. 11) Fix ixgbe WoL on fiber by leaving the laser on during shutdown. 12) ks8851 needs MAX_RECV_FRAMES increased otherwise the internal MAC buffers are easily overflown. Fix from Davide Cimingahi. 13) Fix memory leaks in peak_usb CAN driver, from Jesper Juhl. 14) gred packet scheduler can dump in WRED more when doing a netlink dump. Fix from David Ward. 15) Fix MTU in USB smsc75xx driver, from Stephane Fillod. 16) Dummy device needs ->ndo_uninit handler to properly handle ->ndo_init failures. From Hiroaki SHIMODA. 17) Fix TX fragmentation in ath9k driver, from Sujith Manoharan. 18) Missing RTNL lock in ixgbe PM resume, from Benjamin Poirier. 19) Missing iounmap in farsync WAN driver, from Julia Lawall. 20) With LRO/GRO, tcp_grow_window() is easily tricked into not growing the receive window properly, and this hurts performance. Fix from Eric Dumazet. 21) Network namespace init failure can leak net_generic data, fix from Julian Anastasov. 22) Fix skb_over_panic due to mis-accounting in TCP for partially ACK'd SKBs. From Eric Dumazet. 23) New IDs for qmi_wwan driver, from Bjørn Mork. 24) Fix races in ax25_exit(), from Eric W. Biederman. 25) IPV6 TCP doesn't handle TCP_MAXSEG socket option properly, copy over logic from the IPV4 side. From Neal Cardwell. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits) tcp: fix TCP_MAXSEG for established IPv6 passive sockets drivers/net: Do not free an IRQ if its request failed drop_monitor: allow more events per second ks8851: Fix request_irq/free_irq mismatch net/hyperv: Adding cancellation to ensure rndis filter is closed ks8851: Fix mutex deadlock in ks8851_net_stop() net ax25: Reorder ax25_exit to remove races. icplus: fix interrupt for IC+ 101A/G and 1001LF net: qmi_wwan: support Sierra Wireless MC77xx devices in QMI mode bnx2x: off by one in bnx2x_ets_e3b0_sp_pri_to_cos_set() ksz884x: don't copy too much in netdev_set_mac_address() tcp: fix retransmit of partially acked frames netns: do not leak net_generic data on failed init net/sock.h: fix sk_peek_off kernel-doc warning tcp: fix tcp_grow_window() for large incoming frames drivers/net/wan/farsync.c: add missing iounmap davinci_mdio: Fix MDIO timeout check ipv6: clean up rt6_clean_expires ipv6: fix rt6_update_expires arcnet: rimi: Fix device name in debug output ...	2012-04-22 21:02:57 -07:00
Michael S. Tsirkin	ca8f4fb21d	skbuff: struct ubuf_info callback type safety The skb struct ubuf_info callback gets passed struct ubuf_info itself, not the arg value as the field name and the function signature seem to imply. Rename the arg field to ctx to match usage, add documentation and change the callback argument type to make usage clear and to have compiler check correctness. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-13 13:09:19 -04:00
Zhi Yong Wu	5e7045b010	tools/virtio: fix up vhost/test module build commit `ea5d404655` broke build for the vhost test module used by tools/virtio. Fix it up. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-04-12 10:35:42 +03:00
David S. Miller	f1e84eb3bb	Merge branch 'vhost-net' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	2012-03-23 14:46:48 -04:00
Cong Wang	c6daa7ffa8	vhost: remove the second argument of k[un]map_atomic() Signed-off-by: Cong Wang <amwang@redhat.com>	2012-03-20 21:48:21 +08:00
Michael S. Tsirkin	ea5d404655	vhost: fix release path lockdep checks We shouldn't hold any locks on release path. Pass a flag to vhost_dev_cleanup to use the lockdep info correctly. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Sasha Levin <levinsasha928@gmail.com>	2012-02-28 09:13:22 +02:00
Nadav Har'El	d550dda192	vhost: don't forget to schedule() This is a tiny, but important, patch to vhost. Vhost's worker thread only called schedule() when it had no work to do, and it wanted to go to sleep. But if there's always work to do, e.g., the guest is running a network-intensive program like netperf with small message sizes, schedule() was never called. This had several negative implications (on non-preemptive kernels): 1. Passing time was not properly accounted to the "vhost" process (ps and top would wrongly show it using zero CPU time). 2. Sometimes error messages about RCU timeouts would be printed, if the core running the vhost thread didn't schedule() for a very long time. 3. Worst of all, a vhost thread would "hog" the core. If several vhost threads need to share the same core, typically one would get most of the CPU time (and its associated guest most of the performance), while the others hardly get any work done. The trivial solution is to add if (need_resched()) schedule(); After doing every piece of work. This will not do the heavy schedule() all the time, just when the timer interrupt decided a reschedule is warranted (so need_resched returns true). Thanks to Abel Gordon for this patch. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-02-28 09:13:19 +02:00
stephen hemminger	7c7c7f01cc	vhost-net: add module alias (v2.1) By adding some module aliases, programs (or users) won't have to explicitly call modprobe. Vhost-net will always be available if built into the kernel. It does require assigning a permanent minor number for depmod to work. Also: - use C99 style initialization. - add missing entry in documentation for loop-control Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-By: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-13 10:12:23 -08:00
Arun Sharma	60063497a9	atomic: use <linux/atomic.h> This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:47 -07:00
Shirley Ma	9e380825ab	vhost: handle wrap around in # of bufs math The meth for calculating the # of outstanding buffers gives incorrect results when vq->upend_idx wraps around zero. Fix that. Signed-off-by: Shirley Ma <xma@us.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-07-21 10:48:27 +03:00
Michael S. Tsirkin	c047e5f317	vhost-net: update used ring on backend change On backend change, we flushed out outstanding skbs but forgot to update the used ring, so that done entries were left in the ubuf_info ring. As a result we lose heads or complete incorrect ones, crashing the guest or leaking memory. Fix by updating the used ring. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-07-21 10:23:31 +03:00
Michael S. Tsirkin	b834226b04	vhost: optimize interrupt enable/disable As we now only update used ring after enabling the backend, we can write flags with __put_user: as that's done on data path, it matters. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-07-19 17:17:28 +03:00
Michael S. Tsirkin	75fd9edc10	vhost: fix zcopy reference counting Fix get/put refcount imbalance with zero copy, which caused qemu to hang forever on guest driver unload. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-07-19 13:28:34 +03:00
Jason Wang	2723feaa8e	vhost: set log when updating used flags or avail event We need to log writes when updating used flags and avail event fields. Otherwise the guest may see a stale value after migration and miss notifying the host. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-07-19 13:28:34 +03:00
Jason Wang	f59281dafb	vhost: init used ring after backend was set Move the used ring initialization after backend was set. This makes it possible to disable the backend and tweak the used ring, then restart. This will also make it possible to log the used ring write correctly. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-07-19 13:28:34 +03:00
Michael S. Tsirkin	bab632d69e	vhost: vhost TX zero-copy support >From: Shirley Ma <mashirle@us.ibm.com> This adds experimental zero copy support in vhost-net, disabled by default. To enable, set experimental_zcopytx module option to 1. This patch maintains the outstanding userspace buffers in the sequence it is delivered to vhost. The outstanding userspace buffers will be marked as done once the lower device buffers DMA has finished. This is monitored through last reference of kfree_skb callback. Two buffer indices are used for this purpose. The vhost-net device passes the userspace buffers info to lower device skb through message control. DMA done status check and guest notification are handled by handle_tx: in the worst case is all buffers in the vq are in pending/done status, so we need to notify guest to release DMA done buffers first before we get any new buffers from the vq. One known problem is that if the guest stops submitting buffers, buffers might never get used until some further action, e.g. device reset. This does not seem to affect linux guests. Signed-off-by: Shirley <xma@us.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-18 10:42:32 -07:00
Michael S. Tsirkin	8ea8cf89e1	vhost: support event index Support the new event index feature. When acked, utilize it to reduce the # of interrupts sent to the guest. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2011-05-30 11:14:15 +09:30
Rob Landley	6151658751	Correct occurrences of - Documentation/kvm/ to Documentation/virtual/kvm - Documentation/uml/ to Documentation/virtual/uml - Documentation/lguest/ to Documentation/virtual/lguest throughout the kernel source tree. Signed-off-by: Rob Landley <rob@landley.net> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>	2011-05-06 09:27:55 -07:00
Michael S. Tsirkin	de4d768a42	vhost-net: remove unlocked use of receive_queue Use of skb_queue_empty(&sock->sk->sk_receive_queue) without taking the sk_receive_queue.lock is unsafe or useless. Take it out. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-03-13 23:08:19 +02:00
Jason Wang	783e398854	vhost: lock receive queue, not the socket vhost takes a sock lock to try and prevent the skb from being pulled from the receive queue after skb_peek. However this is not the right lock to use for that, sk_receive_queue.lock is. Fix that up. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-03-13 23:08:04 +02:00
Jason Wang	94249369e9	vhost-net: Unify the code of mergeable and big buffer handling Codes duplication were found between the handling of mergeable and big buffers, so this patch tries to unify them. This could be easily done by adding a quota to the get_rx_bufs() which is used to limit the number of buffers it returns (for mergeable buffer, the quota is simply UIO_MAXIOV, for big buffers, the quota is just 1), and then the previous handle_rx_mergeable() could be resued also for big buffers. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-03-13 17:00:10 +02:00
Jason Wang	cfbdab9513	vhost-net: check the support of mergeable buffer outside the receive loop No need to check the support of mergeable buffer inside the recevie loop as the whole handle_rx()_xx is in the read critical region. So this patch move it ahead of the receiving loop. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-03-13 16:57:30 +02:00
Michael S. Tsirkin	fcc042a280	vhost: copy_from_user -> __copy_from_user copy_from_user is pretty high on perf top profile, replacing it with __copy_from_user helps. It's also safe because we do access_ok checks during setup. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-03-08 18:03:05 +02:00
Krishna Kumar	d47effe1be	vhost: Cleanup vhost.c and net.c Minor cleanup of vhost.c and net.c to match coding style. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-03-08 18:02:47 +02:00
Michael S. Tsirkin	5e18247b02	vhost: rcu annotation fixup When built with rcu checks enabled, vhost triggers bogus warnings as vhost features are read without dev->mutex sometimes, and private pointer is read with our kind of rcu where work serves as a read side critical section. Fixing it properly is not trivial. Disable the warnings by stubbing out the checks for now. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-02-01 16:48:46 +02:00
Michael S. Tsirkin	0174b0c30a	vhost: fix signed/unsigned comparison To detect that a sequence number is done, we are doing math on unsigned integers so the result is unsigned too. Not what was intended for the <= comparison. The result is user stuck forever in flush call. Convert to int to fix this. Further, get rid of ({}) to make code clearer. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-01-10 10:03:39 +02:00
David S. Miller	9fe146aef4	Merge branch 'vhost-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	2010-12-14 11:33:23 -08:00
Michael S. Tsirkin	71ccc212e5	vhost test module This adds a test module for vhost infrastructure. Intentionally not tied to kbuild to prevent people from installing and loading it accidentally. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-12-09 16:00:21 +02:00
Michael S. Tsirkin	28831ee60b	vhost: better variable name in logging We really store a page offset in write_address, so rename it write_page to avoid confusion. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-12-09 16:00:10 +02:00
Michael S. Tsirkin	3bf9be40ff	vhost: correctly set bits of dirty pages Fix two bugs in dirty page logging: When counting pages we should increase address by 1 instead of VHOST_PAGE_SIZE. Make log_write() correctly process requests that cross pages with write_address not starting at page boundary. Reported-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-12-09 15:39:17 +02:00
Jason Wang	a290aec88a	vhost: fix typos in comment Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-12-09 15:39:15 +02:00
Michael S. Tsirkin	bf5e0bd27f	vhost: remove unused include vhost.c does not need to know about sockets, don't include sock.h Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-12-09 15:39:13 +02:00
Michael S. Tsirkin	11cd1a8b8c	vhost/net: fix rcu check usage Incorrect rcu check was used as rcu isn't done under mutex here. Force check to 1 for now, to stop it from complaining. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-11-25 11:29:16 +02:00
Michael S. Tsirkin	8b7347aab6	vhost: get/put_user -> __get/__put_user We do access_ok checks on all ring values on an ioctl, so we don't need to redo them on each access. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-11-04 13:22:12 +02:00
Michael S. Tsirkin	dfe5ac5b18	vhost: copy_to_user -> __copy_to_user We do access_ok checks at setup time, so we don't need to redo them on each access. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-11-04 13:22:11 +02:00
Michael S. Tsirkin	64e1c80748	vhost-net: batch use/unuse mm Move use/unuse mm to vhost.c which makes it possible to batch these operations. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-11-04 13:22:11 +02:00
Michael S. Tsirkin	533a19b4b8	vhost: put mm after thread stop makes it possible to batch use/unuse mm Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-11-04 13:22:10 +02:00
Julia Lawall	3fcedec752	drivers/vhost/vhost.c: delete double assignment Delete successive assignments to the same location. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression i; @@ *i = ...; i = ...; // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-10-26 20:39:30 +02:00
Linus Torvalds	5f05647dd8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits) bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL. vlan: Calling vlan_hwaccel_do_receive() is always valid. tproxy: use the interface primary IP address as a default value for --on-ip tproxy: added IPv6 support to the socket match cxgb3: function namespace cleanup tproxy: added IPv6 support to the TPROXY target tproxy: added IPv6 socket lookup function to nf_tproxy_core be2net: Changes to use only priority codes allowed by f/w tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled tproxy: added tproxy sockopt interface in the IPV6 layer tproxy: added udp6_lib_lookup function tproxy: added const specifiers to udp lookup functions tproxy: split off ipv6 defragmentation to a separate module l2tp: small cleanup nf_nat: restrict ICMP translation for embedded header can: mcp251x: fix generation of error frames can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set can-raw: add msg_flags to distinguish local traffic 9p: client code cleanup rds: make local functions/variables static ... Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and drivers/net/wireless/ath/ath9k/debug.c as per David	2010-10-23 11:47:02 -07:00
Linus Torvalds	092e0e7e52	Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: vfs: make no_llseek the default vfs: don't use BKL in default_llseek llseek: automatically add .llseek fop libfs: use generic_file_llseek for simple_attr mac80211: disallow seeks in minstrel debug code lirc: make chardev nonseekable viotape: use noop_llseek raw: use explicit llseek file operations ibmasmfs: use generic_file_llseek spufs: use llseek in all file operations arm/omap: use generic_file_llseek in iommu_debug lkdtm: use generic_file_llseek in debugfs net/wireless: use generic_file_llseek in debugfs drm: use noop_llseek	2010-10-22 10:52:56 -07:00
David S. Miller	2198a10b50	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/core/dev.c	2010-10-21 08:43:05 -07:00
Arnd Bergmann	6038f373a3	llseek: automatically add .llseek fop All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode i, struct file f) { <+... ( nonseekable_open(...) \| nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable / }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, / open uses nonseekable / }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, / we have seq_read / }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, / readdir is present / }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, / read accesses f_pos / }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, / write accesses f_pos / }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, / read and write both use no f_pos / }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, / write uses no f_pos / }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, / read uses no f_pos / }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, / no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>	2010-10-15 15:53:27 +02:00
Dan Carpenter	6d97e55f71	vhost: fix return code for log_access_ok() access_ok() returns 1 if it's OK otherwise it should return 0. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-10-12 14:32:28 +02:00
Ingo Molnar	556ef63255	Merge commit 'v2.6.36-rc7' into core/rcu Merge reason: Update from -rc3 to -rc7. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-07 09:43:45 +02:00
Ingo Molnar	d4f8f217b8	Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/rcu	2010-10-07 09:43:11 +02:00
Jason Wang	e0e9b40647	vhost: max s/g to match qemu Qemu supports up to UIO_MAXIOV s/g so we have to match that because guest drivers may rely on this. Allocate indirect and log arrays dynamically to avoid using too much contigious memory and make the length of hdr array to match the header length since each iovec entry has a least one byte. Test with copying large files w/ and w/o migration in both linux and windows guests. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-10-05 13:33:43 +02:00
Michael S. Tsirkin	5786aee8bf	vhost: fix log ctx signalling The log eventfd signalling got put in dead code. We didn't notice because qemu currently does polling instead of eventfd select. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-09-22 16:21:33 +02:00
Michael S. Tsirkin	ee05d6939e	vhost-net: fix range checking in mrg bufs case In mergeable buffer case, we use headcount, log_num and seg as indexes in same-size arrays, and we know that headcount <= seg and log_num equals either 0 or seg. Therefore, the right thing to do is range-check seg, not headcount as we do now: these will be different if guest chains s/g descriptors (this does not happen now, but we can not trust the guest). Long term, we should add BUG_ON checks to verify two other indexes are what we think they should be. Reported-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-09-14 15:22:41 +02:00
Michael S. Tsirkin	615cc2211c	vhost: error handling fix vhost should set worker to NULL on cgroups attach failure, so that we won't try to destroy the worker again on close. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-09-06 09:49:39 +03:00
Michael S. Tsirkin	87d6a412bd	vhost: fix attach to cgroups regression Since 2.6.36-rc1, non-root users of vhost-net fail to attach if they are in any cgroups. The reason is that when qemu uses vhost, vhost wants to attach its thread to all cgroups that qemu has. But we got the API backwards, so a non-priveledged process (Qemu) tried to control the priveledged one (vhost), which fails. Fix this by switching to the new cgroup_attach_task_all, and running it from the vhost thread. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-09-06 09:49:31 +03:00
Eric Dumazet	78b620ce9e	vhost: stop worker only if created Its currently illegal to call kthread_stop(NULL) Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-01 14:26:13 -07:00
Arnd Bergmann	28457ee69c	vhost: add __rcu annotations Also add rcu_dereference_protected() for code paths where locks are held. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: "Michael S. Tsirkin" <mst@redhat.com>	2010-08-21 16:27:36 -07:00
Linus Torvalds	6ba74014c1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits) phy/marvell: add 88ec048 support igb: Program MDICNFG register prior to PHY init e1000e: correct MAC-PHY interconnect register offset for 82579 hso: Add new product ID can: Add driver for esd CAN-USB/2 device l2tp: fix export of header file for userspace can-raw: Fix skb_orphan_try handling Revert "net: remove zap_completion_queue" net: cleanup inclusion phy/marvell: add 88e1121 interface mode support u32: negative offset fix net: Fix a typo from "dev" to "ndev" igb: Use irq_synchronize per vector when using MSI-X ixgbevf: fix null pointer dereference due to filter being set for VLAN 0 e1000e: Fix irq_synchronize in MSI-X case e1000e: register pm_qos request on hardware activation ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice net: Add getsockopt support for TCP thin-streams cxgb4: update driver version cxgb4: add new PCI IDs ... Manually fix up conflicts in: - drivers/net/e1000e/netdev.c: due to pm_qos registration infrastructure changes - drivers/net/phy/marvell.c: conflict between adding 88ec048 support and cleaning up the IDs - drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req conflict (registration change vs marking it static)	2010-08-04 11:47:58 -07:00
David Stevens	8dd014adfe	vhost-net: mergeable buffers support This adds support for mergeable buffers in vhost-net: this is needed for older guests without indirect buffer support, as well as for zero copy with some devices. Includes changes by Michael S. Tsirkin to make the patch as low risk as possible (i.e., close to no changes when feature is disabled). Signed-off-by: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-07-28 15:45:36 +03:00
Michael S. Tsirkin	9e3d195720	vhost: apply cgroup to vhost workers Apply the cgroup of the owner task to the created vhost worker. Based on patches from Sridhar Samudrala's and Tejun Heo. Later we'll need to also apply cpumask and probably priority of the owner process. Discussion on the best way to do this is still ongoing. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Sridhar Samudrala <samudrala.sridhar@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com>	2010-07-28 15:45:18 +03:00
Tejun Heo	c23f3445e6	vhost: replace vhost_workqueue with per-vhost kthread Replace vhost_workqueue with per-vhost kthread. Other than callback argument change from struct work_struct * to struct vhost_work , there's no visible change to vhost_poll_() interface. This conversion is to make each vhost use a dedicated kthread so that resource control via cgroup can be applied. Partially based on Sridhar Samudrala's patch. * Updated to use sub structure vhost_work instead of directly using vhost_poll at Michael's suggestion. * Added flusher wake_up() optimization at Michael's suggestion. Changes by MST: * Converted atomics/barrier use to a spinlock. * Create thread on SET_OWNER * Fix flushing Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: Sridhar Samudrala <samudrala.sridhar@gmail.com>	2010-07-28 15:44:53 +03:00
David S. Miller	4c3e5edf2f	vhost net: Fix warning. Reported by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-21 21:09:23 -07:00
David S. Miller	11fe883936	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/vhost/net.c net/bridge/br_device.c Fix merge conflict in drivers/vhost/net.c with guidance from Stephen Rothwell. Revert the effects of net-2.6 commit `573201f36f` since net-next-2.6 has fixes that make bridge netpoll work properly thus we don't need it disabled. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-20 18:25:24 -07:00
Linus Torvalds	516bd66415	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (24 commits) bridge: Partially disable netpoll support tcp: fix crash in tcp_xmit_retransmit_queue IPv6: fix CoA check in RH2 input handler (mip6_rthdr_input()) ibmveth: lost IRQ while closing/opening device leads to service loss rt2x00: Fix lockdep warning in rt2x00lib_probe_dev() vhost: avoid pr_err on condition guest can trigger ipmr: Don't leak memory if fib lookup fails. vhost-net: avoid flush under lock net: fix problem in reading sock TX queue net/core: neighbour update Oops net: skb_tx_hash() fix relative to skb_orphan_try() rfs: call sock_rps_record_flow() in tcp_splice_read() xfrm: do not assume that template resolving always returns xfrms hostap_pci: set dev->base_addr during probe axnet_cs: use spin_lock_irqsave in ax_interrupt dsa: Fix Kconfig dependencies. act_nat: not all of the ICMP packets need an IP header payload r8169: incorrect identifier for a 8168dp Phonet: fix skb leak in pipe endpoint accept() Bluetooth: Update sec_level/auth_type for already existing connections ...	2010-07-20 16:26:42 -07:00
Michael S. Tsirkin	95c0ec6a97	vhost: avoid pr_err on condition guest can trigger Guest can trigger packet truncation by posting a very short buffer and disabling buffer merging. Convert pr_err to pr_debug to avoid log from filling up when this happens. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-07-16 12:18:17 +03:00
Michael S. Tsirkin	1680e9063e	vhost-net: avoid flush under lock We flush under vq mutex when changing backends. This creates a deadlock as workqueue being flushed needs this lock as well. https://bugzilla.redhat.com/show_bug.cgi?id=612421 Drop the vq mutex before flush: we have the device mutex which is sufficient to prevent another ioctl from touching the vq. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-07-15 15:26:12 +03:00
Linus Torvalds	2aa72f6121	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (35 commits) NET: SB1250: Initialize .owner vxge: show startup message with KERN_INFO ll_temac: Fix missing iounmaps bridge: Clear IPCB before possible entry into IP stack bridge br_multicast: BUG: unable to handle kernel NULL pointer dereference net: Fix definition of netif_vdbg() when VERBOSE_DEBUG is defined net/ne: fix memory leak in ne_drv_probe() xfrm: fix xfrm by MARK logic virtio_net: fix oom handling on tx virtio_net: do not reschedule rx refill forever s2io: resolve statistics issues linux/net.h: fix kernel-doc warnings net: decreasing real_num_tx_queues needs to flush qdisc sched: qdisc_reset_all_tx is calling qdisc_reset without qdisc_lock qlge: fix a eeh handler to not add a pending timer qlge: Replacing add_timer() to mod_timer() usbnet: Set parent device early for netdev_printk() net: Revert "rndis_host: Poll status channel before control channel" netfilter: ip6t_REJECT: fix a dst leak in ipv6 REJECT drivers: bluetooth: bluecard_cs.c: Fixed include error, changed to linux/io.h ...	2010-07-07 19:56:00 -07:00
David S. Miller	597e608a84	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-07-07 15:59:38 -07:00
Michael S. Tsirkin	7b3384fc30	vhost: add unlikely annotations to error path patch 'break out of polling loop on error' caused a minor performance regression on my machine: recover that performance by adding a bunch of unlikely annotations in the error handling. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-07-01 19:25:40 +03:00
Michael S. Tsirkin	d5675bd204	vhost: break out of polling loop on error When ring parsing fails, we currently handle this as ring empty condition. This means that we enable kicks and recheck ring empty: if this not empty, we re-start polling which of course will fail again. Instead, let's return a negative error code and stop polling. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-06-27 11:52:25 +03:00
Alan Cox	79907d89c3	misc: Fix allocation 'borrowed' by vhost_net 10, 233 is allocated officially to /dev/kmview which is shipping in Ubuntu and Debian distributions. vhost_net seem to have borrowed it without making a proper request and this causes regressions in the other distributions. vhost_net can use a dynamic minor so use that instead. Also update the file with a comment to try and avoid future misunderstandings. cc: stable@kernel.org Signed-off-by: Alan Cox <device@lanana.org> [ We should have caught this before 2.6.34 got released. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-09 08:50:31 -07:00
David S. Miller	0dea7c12fc	Merge branch 'vhost-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	2010-06-02 08:26:36 -07:00
Linus Torvalds	72da3bc0cb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits) netlink: bug fix: wrong size was calculated for vfinfo list blob netlink: bug fix: don't overrun skbs on vf_port dump xt_tee: use skb_dst_drop() netdev/fec: fix ifconfig eth0 down hang issue cnic: Fix context memory init. on 5709. drivers/net: Eliminate a NULL pointer dereference drivers/net/hamradio: Eliminate a NULL pointer dereference be2net: Patch removes redundant while statement in loop. ipv6: Add GSO support on forwarding path net: fix __neigh_event_send() vhost: fix the memory leak which will happen when memory_access_ok fails vhost-net: fix to check the return value of copy_to/from_user() correctly vhost: fix to check the return value of copy_to/from_user() correctly vhost: Fix host panic if ioctl called with wrong index net: fix lock_sock_bh/unlock_sock_bh net/iucv: Add missing spin_unlock net: ll_temac: fix checksum offload logic net: ll_temac: fix interrupt bug when interrupt 0 is used sctp: dubious bitfields in sctp_transport ipmr: off by one in __ipmr_fill_mroute() ...	2010-05-28 10:18:40 -07:00
Takuya Yoshikawa	a02c37891a	vhost: fix the memory leak which will happen when memory_access_ok fails We need to free newmem when vhost_set_memory() fails to complete. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-05-27 13:55:17 +03:00
Takuya Yoshikawa	d3553a5249	vhost-net: fix to check the return value of copy_to/from_user() correctly copy_to/from_user() returns the number of bytes that could not be copied. So we need to check if it is not zero, and in that case, we should return the error number -EFAULT rather than directly return the return value from copy_to/from_user(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-05-27 13:55:13 +03:00
Takuya Yoshikawa	7ad9c9d270	vhost: fix to check the return value of copy_to/from_user() correctly copy_to/from_user() returns the number of bytes that could not be copied. So we need to check if it is not zero, and in that case, we should return the error number -EFAULT rather than directly return the return value from copy_to/from_user(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-05-27 13:54:59 +03:00
Michael S. Tsirkin	f8322fbe04	vhost: whitespace fix Fix up whitespace in vq_memory_access_ok. Reported-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-05-27 12:28:03 +03:00
Jeff Dike	dd1f4078f0	vhost-net: minor cleanup Delete a label and goto from vhost_net_set_backend Inverting a test allows a label and goto to be eliminated. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-05-27 12:25:37 +03:00
Tobias Klauser	373a83a699	vhost: Storage class should be before const qualifier The C99 specification states in section 6.11.5: The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-05-27 12:23:49 +03:00
Krishna Kumar	0f3d9a1746	vhost: Fix host panic if ioctl called with wrong index Missed a boundary value check in vhost_set_vring. The host panics if idx == nvqs is used in ioctl commands in vhost_virtqueue_init. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-05-27 12:19:02 +03:00
Alexey Dobriyan	4be929be34	kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, SHRT_MAX and SHRT_MIN - C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not USHORT_MAX/SHORT_MAX/SHORT_MIN. - Make SHRT_MIN of type s16, not int, for consistency. [akpm@linux-foundation.org: fix drivers/dma/timb_dma.c] [akpm@linux-foundation.org: fix security/keys/keyring.c] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-25 08:07:02 -07:00
David S. Miller	6811d58fc1	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: include/linux/if_link.h	2010-05-16 22:26:58 -07:00
Michael S. Tsirkin	0d4993563b	vhost: fix barrier pairing According to memory-barriers.txt, an smp memory barrier in guest should always be paired with an smp memory barrier in host, and I quote "a lack of appropriate pairing is almost certainly an error". In case of vhost, failure to flush out used index update before looking at the interrupt disable flag could result in missed interrupts, resulting in networking hang under stress. This might happen when flags read bypasses used index write. So we see interrupts disabled and do not interrupt, at the same time guest writes flags value to enable interrupt, reads an old used index value, thinks that used ring is empty and waits for interrupt. Note: the barrier we pair with here is in drivers/virtio/virtio_ring.c, function vring_enable_cb. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Juan Quintela <quintela@redhat.com>	2010-05-12 18:04:04 +03:00
David S. Miller	fea0691526	Merge branch 'vhost' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	2010-04-14 22:52:46 -07:00
Christoph Hellwig	a8d3782f9e	vhost: fix sparse warnings Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-04-14 11:52:08 +03:00
David S. Miller	4a1032faac	Merge branch 'master' of /home/davem/src/GIT/linux-2.6/	2010-04-11 02:44:30 -07:00
Jeff Dike	179b284e2f	vhost-net: fix vq_memory_access_ok error checking vq_memory_access_ok needs to check whether mem == NULL Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-04-07 17:04:45 +03:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Michael S. Tsirkin	535297a6ae	vhost: fix error handling in vring ioctls Stanse found a locking problem in vhost_set_vring: several returns from VHOST_SET_VRING_KICK, VHOST_SET_VRING_CALL, VHOST_SET_VRING_ERR with the vq->mutex held. Fix these up. Reported-by: Jiri Slaby <jirislaby@gmail.com> Acked-by: Laurent Chavey <chavey@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-03-17 23:07:35 +02:00
Michael S. Tsirkin	0e25557212	vhost: fix interrupt mitigation with raw sockets A thinko in code means we never trigger interrupt mitigation. Fix this. Reported-by: Juan Quintela <quintela@redhat.com> Reported-by: Unai Uribarri <unai.uribarri@optenet.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-03-17 16:44:20 +02:00
Jeff Dike	1dace8c801	vhost: fix error path in vhost_net_set_backend An error could cause vhost_net_set_backend to exit without unlocking vq->mutex. Fix this. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-03-07 13:28:53 +02:00
Sridhar Samudrala	39286fa41a	vhost-net: restart tx poll on sk_sndbuf full guest to remote communication with vhost net sometimes stops until guest driver is restarted. This happens when we get guest kick precisely when the backend send queue is full, as a result handle_tx() returns without polling backend. This patch fixes this by restarting tx poll on this condition. Signed-off-by: Sridhar Samudrala <samudrala@us.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Tom Lendacky <toml@us.ibm.com>	2010-02-28 19:50:33 +02:00
Michael S. Tsirkin	d6db3f5c11	vhost: fix get_user_pages_fast error handling get_user_pages_fast returns number of pages on success, negative value on failure, but never 0. Fix vhost code to match this logic. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-02-28 19:42:36 +02:00
Michael S. Tsirkin	73a99f0830	vhost: initialize log eventfd context pointer vq log eventfd context pointer needs to be initialized, otherwise operation may fail or oops if log is enabled but log eventfd not set by userspace. When log_ctx for device is created, it is copied to the vq. This reset was missing. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-02-28 19:42:35 +02:00
Michael S. Tsirkin	86e9424d72	vhost: logging thinko fix vhost was dong some complex math to get offset to log at, and got it wrong by a couple of bytes, while in fact it's simple: get address where we write, subtract start of buffer, add log base. Do it this way. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2010-02-28 19:42:35 +02:00
Arnd Bergmann	501c774cb1	net/macvtap: add vhost support This adds support for passing a macvtap file descriptor into vhost-net, much like we already do for tun/tap. Most of the new code is taken from the respective patch in the tun driver and may get consolidated in the future. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-18 14:08:38 -08:00
Michael S. Tsirkin	5659338c88	vhost-net: switch to smp barriers vhost-net only uses memory barriers to control SMP effects (communication with userspace potentially running on a different CPU), so it should use SMP barriers and not mandatory barriers for memory access ordering, as suggested by Documentation/memory-barriers.txt Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-14 22:42:53 -08:00
Michael S. Tsirkin	17660f8124	vhost: fix TUN=m VHOST_NET=y drivers/built-in.o: In function `get_tun_socket': net.c:(.text+0x15436e): undefined reference to `tun_get_socket' If tun is a module, vhost must be a module, too. If tun is built-in or disabled, vhost can be built-in. Note: TUN \|\| !TUN might look a bit strange until you realize that boolean logic rules do not apply for tristate variables. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-01-21 01:28:45 -08:00
Michael S. Tsirkin	3a4d5c94e9	vhost_net: a kernel-level virtio server What it is: vhost net is a character device that can be used to reduce the number of system calls involved in virtio networking. Existing virtio net code is used in the guest without modification. There's similarity with vringfd, with some differences and reduced scope - uses eventfd for signalling - structures can be moved around in memory at any time (good for migration, bug work-arounds in userspace) - write logging is supported (good for migration) - support memory table and not just an offset (needed for kvm) common virtio related code has been put in a separate file vhost.c and can be made into a separate module if/when more backends appear. I used Rusty's lguest.c as the source for developing this part : this supplied me with witty comments I wouldn't be able to write myself. What it is not: vhost net is not a bus, and not a generic new system call. No assumptions are made on how guest performs hypercalls. Userspace hypervisors are supported as well as kvm. How it works: Basically, we connect virtio frontend (configured by userspace) to a backend. The backend could be a network device, or a tap device. Backend is also configured by userspace, including vlan/mac etc. Status: This works for me, and I haven't see any crashes. Compared to userspace, people reported improved latency (as I save up to 4 system calls per packet), as well as better bandwidth and CPU utilization. Features that I plan to look at in the future: - mergeable buffers - zero copy - scalability tuning: figure out the best threading model to use Note on RCU usage (this is also documented in vhost.h, near private_pointer which is the value protected by this variant of RCU): what is happening is that the rcu_dereference() is being used in a workqueue item. The role of rcu_read_lock() is taken on by the start of execution of the workqueue item, of rcu_read_unlock() by the end of execution of the workqueue item, and of synchronize_rcu() by flush_workqueue()/flush_work(). In the future we might need to apply some gcc attribute or sparse annotation to the function passed to INIT_WORK(). Paul's ack below is for this RCU usage. (Includes fixes by Alan Cox <alan@linux.intel.com>, David L Stevens <dlstevens@us.ibm.com>, Chris Wright <chrisw@redhat.com>) Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-01-15 01:43:29 -08:00

... 4 5 6 7 8 ...

419 Commits