OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jason Wang	3afc9621f1	macvtap: zerocopy: fix offset calculation when building skb This patch fixes the offset calculation when building skb: - offset1 were used as skb data offset not vector offset - reset offset to zero only when we advance to next vector Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-05-02 18:22:17 +03:00
Al Viro	4040153087	security: trim security.h Trim security.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>	2012-02-14 10:45:42 +11:00
Krishna Kumar	ef0002b577	macvtap: Fix macvtap_get_queue to use rxhash first It was reported that the macvtap device selects a Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-20 13:45:55 -05:00
Eric Dumazet	2cfa5a0471	net: treewide use of RCU_INIT_POINTER rcu_assign_pointer(ptr, NULL) can be safely replaced by RCU_INIT_POINTER(ptr, NULL) (old rcu_assign_pointer() macro was testing the NULL value and could omit the smp_wmb(), but this had to be removed because of compiler warnings) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-23 18:48:19 -05:00
Eric W. Biederman	e09eff7fc1	macvtap: Fix the minor device number allocation On systems that create and delete lots of dynamic devices the 31bit linux ifindex fails to fit in the 16bit macvtap minor, resulting in unusable macvtap devices. I have systems running automated tests that that hit this condition in just a few days. Use a linux idr allocator to track which mavtap minor numbers are available and and to track the association between macvtap minor numbers and macvtap network devices. Remove the unnecessary unneccessary check to see if the network device we have found is indeed a macvtap device. With macvtap specific data structures it is impossible to find any other kind of networking device. Increase the macvtap minor range from 65536 to the full 20 bits that is supported by linux device numbers. It doesn't solve the original problem but there is no penalty for a larger minor device range. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-21 02:53:07 -04:00
Eric W. Biederman	9bf1907f42	macvtap: Rewrite macvtap_newlink so the error handling works. Place macvlan_common_newlink at the end of macvtap_newlink because failing in newlink after registering your network device is not supported. Move device_create into a netdevice creation notifier. The network device notifier is the only hook that is called after the network device has been registered with the device layer and before register_network_device returns success. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-21 02:53:07 -04:00
Eric W. Biederman	2259fef0bb	macvtap: Don't leak unreceived packets when we delete a macvtap device. To avoid leaking packets in the receive queue. Add a socket destructor that will run whenever destroy a macvtap socket. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-21 02:53:07 -04:00
Eric W. Biederman	047af9cfed	macvtap: Fix macvtap_open races in the zero copy enable code. To see if it is appropriate to enable the macvtap zero copy feature don't test the lowerdev network device flags. Instead test the macvtap network device flags which are a direct copy of the lowerdev flags. This is important because nothing holds a reference to lowerdev and on a very bad day we lowerdev could be a pointer to stale memory. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-21 02:53:07 -04:00
Eric W. Biederman	99f34b38cd	macvtap: Close a race between macvtap_open and macvtap_dellink. There is a small window in macvtap_open between looking up a networking device and calling macvtap_set_queue in which macvtap_del_queues called from macvtap_dellink. After calling macvtap_del_queues it is totally incorrect to allow macvtap_set_queue to proceed so prevent success by reporting that all of the available queues are in use. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-21 02:53:06 -04:00
Jason Wang	653fc91557	macvtap: fix the uninitialized var using in macvtap_alloc_skb() Commit `d1b08284` use new frag API but would leave f to be used uninitialized, this patch fix it. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-09-20 14:37:22 -04:00
Ian Campbell	d1b08284ad	macvtap: convert to SKB paged frag API. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2011-09-15 15:34:59 -04:00
Shirley Ma	97bc3633be	macvtap: macvtapTX zero-copy support Only 128 bytes is copied, the rest of data is DMA mapped directly from userspace. Signed-off-by: Shirley Ma <xma@...ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-07 04:41:24 -07:00
Jason Wang	10a8d94a95	virtio_net: introduce VIRTIO_NET_HDR_F_DATA_VALID There's no need for the guest to validate the checksum if it have been validated by host nics. So this patch introduces a new flag - VIRTIO_NET_HDR_F_DATA_VALID which is used to bypass the checksum examing in guest. The backend (tap/macvtap) may set this flag when met skbs with CHECKSUM_UNNECESSARY to save cpu utilization. No feature negotiation is needed as old driver just ignore this flag. Iperf shows 12%-30% performance improvement for UDP traffic. For TCP, when gro is on no difference as it produces skb with partial checksum. But when gro is disabled, 20% or even higher improvement could be measured by netperf. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-11 15:57:47 -07:00
David S. Miller	33175d84ee	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x_cmn.c	2011-03-10 14:26:00 -08:00
Nicolas Kaiser	ce3c869283	drivers/net/macvtap: fix error check 'len' is unsigned of type size_t and can't be negative. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-07 15:57:58 -08:00
Eric Dumazet	13707f9e5e	drivers/net: remove some rcu sparse warnings Add missing __rcu annotations and helpers. minor : Fix some rcu_dereference() calls in macvtap Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-27 15:02:57 -08:00
Eric Dumazet	1ac9ad1394	net: remove dev_txq_stats_fold() After recent changes, (percpu stats on vlan/tunnels...), we dont need anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters. Only remaining users are ixgbe, sch_teql, gianfar & macvlan : 1) ixgbe can be converted to use existing tx_ring counters. 2) macvlan incremented txq->tx_dropped, it can use the dev->stats.tx_dropped counter. 3) sch_teql : almost revert `ab35cd4b8f` (Use net_device internal stats) Now we have ndo_get_stats64(), use it, even for "unsigned long" fields (No need to bring back a struct net_device_stats) 4) gianfar adds a stats structure per tx queue to hold tx_bytes/tx_packets This removes a lockdep warning (and possible lockup) in rndis gadget, calling dev_get_stats() from hard IRQ context. Ref: http://www.spinics.net/lists/netdev/msg149202.html Reported-by: Neil Jones <neiljay@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Sandeep Gopalpet <sandeep.kumar@freescale.com> CC: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-13 21:44:34 -08:00
Michał Mirosław	55508d601d	net: Use skb_checksum_start_offset() Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset(). Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb). Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 14:43:14 -08:00
Krishna Kumar	1565c7c1c4	macvtap: Implement multiqueue for macvtap driver Implement multiqueue facility for macvtap driver. The idea is that a macvtap device can be opened multiple times and the fd's can be used to register eg, as backend for vhost. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-16 21:06:25 -07:00
David S. Miller	bb7e95c8fd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x_main.c Merge bnx2x bug fixes in by hand... :-/ Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-27 21:01:35 -07:00
Herbert Xu	8a35747a5d	macvtap: Limit packet queue length Mark Wagner reported OOM symptoms when sending UDP traffic over a macvtap link to a kvm receiver. This appears to be caused by the fact that macvtap packet queues are unlimited in length. This means that if the receiver can't keep up with the rate of flow, then we will hit OOM. Of course it gets worse if the OOM killer then decides to kill the receiver. This patch imposes a cap on the packet queue length, in the same way as the tuntap driver, using the device TX queue length. Please note that macvtap currently has no way of giving congestion notification, that means the software device TX queue cannot be used and packets will always be dropped once the macvtap driver queue fills up. This shouldn't be a great problem for the scenario where macvtap is used to feed a kvm receiver, as the traffic is most likely external in origin so congestion notification can't be applied anyway. Of course, if anybody decides to complain about guest-to-guest UDP packet loss down the track, then we may have to revisit this. Incidentally, this patch also fixes a real memory leak when macvtap_get_queue fails. Chris Wright noticed that for this patch to work, we need a non-zero TX queue length. This patch includes his work to change the default macvtap TX queue length to 500. Reported-by: Mark Wagner <mwagner@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-22 13:08:56 -07:00
David S. Miller	1ebed71ae2	macvtap: Use dev_t for macvtap_major. Reported-by: "Robert P. J. Day" <rpjday@crashcourse.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-10 19:25:50 -07:00
Michael S. Tsirkin	55afbd0810	macvtap: add ioctl to modify vnet header size This adds TUNSETVNETHDRSZ/TUNGETVNETHDRSZ support to macvtap. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David S. Miller <davem@davemloft.net>	2010-05-04 01:35:47 +03:00
Eric Dumazet	4381548237	net: sock_def_readable() and friends RCU conversion sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we need two atomic operations (and associated dirtying) per incoming packet. RCU conversion is pretty much needed : 1) Add a new structure, called "struct socket_wq" to hold all fields that will need rcu_read_lock() protection (currently: a wait_queue_head_t and a struct fasync_struct pointer). [Future patch will add a list anchor for wakeup coalescing] 2) Attach one of such structure to each "struct socket" created in sock_alloc_inode(). 3) Respect RCU grace period when freeing a "struct socket_wq" 4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct socket_wq" 5) Change sk_sleep() function to use new sk->sk_wq instead of sk->sk_sleep 6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside a rcu_read_lock() section. 7) Change all sk_has_sleeper() callers to : - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock) - Use wq_has_sleeper() to eventually wakeup tasks. - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock) 8) sock_wake_async() is modified to use rcu protection as well. 9) Exceptions : macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq" instead of dynamically allocated ones. They dont need rcu freeing. Some cleanups or followups are probably needed, (possible sk_callback_lock conversion to a spinlock for example...). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-01 15:00:15 -07:00
Eric Dumazet	4a4771a58e	net: use sk_sleep() Commit `aa395145` (net: sk_sleep() helper) missed three files in the conversion. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-26 11:18:45 -07:00
Eric Dumazet	aa39514516	net: sk_sleep() helper Define a new function to return the waitqueue of a "struct sock". static inline wait_queue_head_t sk_sleep(struct sock sk) { return sk->sk_sleep; } Change all read occurrences of sk_sleep by a call to this function. Needed for a future RCU conversion. sk_sleep wont be a field directly available. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-20 16:37:13 -07:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Arnd Bergmann	b9fb9ee07e	macvtap: add GSO/csum offload support Added flags field to macvtap_queue to enable/disable processing of virtio_net_hdr via IFF_VNET_HDR. This flag is checked to prepend virtio_net_hdr in the receive path and process/skip virtio_net_hdr in the send path. Original patch by Sridhar, further changes by Arnd. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-18 14:08:38 -08:00
Arnd Bergmann	501c774cb1	net/macvtap: add vhost support This adds support for passing a macvtap file descriptor into vhost-net, much like we already do for tun/tap. Most of the new code is taken from the respective patch in the tun driver and may get consolidated in the future. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-18 14:08:38 -08:00
Arnd Bergmann	02df55d28c	macvtap: rework object lifetime rules This reworks the change done by the previous patch in a more complete way. The original macvtap code has a number of problems resulting from the use of RCU for protecting the access to struct macvtap_queue from open files. This includes - need for GFP_ATOMIC allocations for skbs - potential deadlocks when copy_*_user sleeps - inability to work with vhost-net Changing the lifetime of macvtap_queue to always depend on the open file solves all these. The RCU reference simply moves one step down to the reference on the macvlan_dev, which we only need for nonblocking operations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-18 14:08:37 -08:00
Arnd Bergmann	564517e804	net/macvtap: fix reference counting The RCU usage in the original code was broken because there are cases where we possibly sleep with rcu_read_lock held. As a fix, change the macvtap_file_get_queue to get a reference on the socket and the netdev instead of taking the full rcu_read_lock. Also, change macvtap_file_get_queue failure case to not require a subsequent macvtap_file_put_queue, as pointed out by Ed Swierk. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Ed Swierk <eswierk@aristanetworks.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Ed Swierk <eswierk@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-15 21:49:49 -08:00
Arnd Bergmann	20d29d7a91	net: macvtap driver In order to use macvlan with qemu and other tools that require a tap file descriptor, the macvtap driver adds a small backend with a character device with the same interface as the tun driver, with a minimum set of features. Macvtap interfaces are created in the same way as macvlan interfaces using ip link, but the netif is just used as a handle for configuration and accounting, while the data goes through the chardev. Each macvtap interface has its own character device, simplifying permission management significantly over the generic tun/tap driver. Cc: Patrick McHardy <kaber@trash.net> Cc: Stephen Hemminger <shemminger@linux-foundation.org> Cc: David S. Miller" <davem@davemloft.net> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Or Gerlitz <ogerlitz@voltaire.com> Cc: netdev@vger.kernel.org Cc: bridge@lists.linux-foundation.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-03 20:20:33 -08:00

32 Commits