OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jiri Pirko	f7b12606b5	rtnl: make ifla_policy static The only place this is used outside rtnetlink.c is veth. So provide wrapper function for this usage. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-18 18:15:42 -05:00
WANG Cong	1c213bd24a	net: introduce netdev_alloc_pcpu_stats() for drivers There are many drivers calling alloc_percpu() to allocate pcpu stats and then initializing ->syncp. So just introduce a helper function for them. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-14 15:49:55 -05:00
Linus Torvalds	5e30025a31	Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: "The biggest changes: - add lockdep support for seqcount/seqlocks structures, this unearthed both bugs and required extra annotation. - move the various kernel locking primitives to the new kernel/locking/ directory" * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) block: Use u64_stats_init() to initialize seqcounts locking/lockdep: Mark __lockdep_count_forward_deps() as static lockdep/proc: Fix lock-time avg computation locking/doc: Update references to kernel/mutex.c ipv6: Fix possible ipv6 seqlock deadlock cpuset: Fix potential deadlock w/ set_mems_allowed seqcount: Add lockdep functionality to seqcount/seqlock structures net: Explicitly initialize u64_stats_sync structures for lockdep locking: Move the percpu-rwsem code to kernel/locking/ locking: Move the lglocks code to kernel/locking/ locking: Move the rwsem code to kernel/locking/ locking: Move the rtmutex code to kernel/locking/ locking: Move the semaphore core to kernel/locking/ locking: Move the spinlock code to kernel/locking/ locking: Move the lockdep code to kernel/locking/ locking: Move the mutex code to kernel/locking/ hung_task debugging: Add tracepoint to report the hang x86/locking/kconfig: Update paravirt spinlock Kconfig description lockstat: Report avg wait and hold times lockdep, x86/alternatives: Drop ancient lockdep fixup message ...	2013-11-14 16:30:30 +09:00
John Stultz	827da44c61	net: Explicitly initialize u64_stats_sync structures for lockdep In order to enable lockdep on seqcount/seqlock structures, we must explicitly initialize any locks. The u64_stats_sync structure, uses a seqcount, and thus we need to introduce a u64_stats_init() function and use it to initialize the structure. This unfortunately adds a lot of fairly trivial initialization code to a number of drivers. But the benefit of ensuring correctness makes this worth while. Because these changes are required for lockdep to be enabled, and the changes are quite trivial, I've not yet split this patch out into 30-some separate patches, as I figured it would be better to get the various maintainers thoughts on how to best merge this change along with the seqcount lockdep enablement. Feedback would be appreciated! Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Jesse Gross <jesse@nicira.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mirko Lindner <mlindner@marvell.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Roger Luethi <rl@hellgate.ch> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Simon Horman <horms@verge.net.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-11-06 12:40:25 +01:00
Eric Dumazet	82d8189826	veth: extend features to support tunneling While investigating on a recent vxlan regression, I found veth was using a zero features set for vxlan tunnels. We have to segment GSO frames, copy the payload, and do the checksum. This patch brings a ~200% performance increase We probably have to add hw_enc_features support on other virtual devices. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-28 00:57:29 -04:00
Gao feng	5c70ef85a2	veth: allow to setup multicast address for veth device We can only setup multicast address for network device when net_device_ops->ndo_set_rx_mode is not null. Some configurations need to add multicast address for net device, such as netfilter cluster match module. Add a fake ndo_set_rx_mode function to allow this operation. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-10 00:08:07 -04:00
David S. Miller	b343ca84b4	Revert "veth: Showing peer of veth type dev in ip link (kernel side)" This reverts commit `612c337306`. As per Stephen Hemminger, the layout of the netlink attribute is not implemented correctly so revert this for now. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-08 21:52:03 -04:00
Masatake YAMATO	612c337306	veth: Showing peer of veth type dev in ip link (kernel side) ip link has ability to show extra information of net work device if kernel provides sunh information. With this patch veth driver can provide its peer ifindex information to ip command via netlink interface. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-08 15:23:25 -04:00
Flavio Leitner	b69bbddfa1	veth: add vlan features The veth device doesn't provide the vlan features, so TSO for example is disabled and that causes performance issues when using tagged traffic. The test topology looks like this: br0 br1 / \ / \ vnet veth0.10 ----- veth1.10 vnet VM VM The netperf results with current veth driver: MIGRATED TCP STREAM TEST from 192.168.1.1 () port 0 AF_INET to 192.168.1.2 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.01 2210.22 Now after applying the proposed patch: MIGRATED TCP STREAM TEST from 192.168.1.1 () port 0 AF_INET to 192.168.1.2 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 13067.47 Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-19 17:36:03 -07:00
Hong zhi guo	3f8b96379a	veth: remove redundant call of dev_alloc_name it's called in the following register_netdevice. No need to call it here. Tested with "ip link add type veth" and "ip link add xxx%d type veth". Signed-off-by: Hong Zhiguo <honkiko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-12 01:21:20 -07:00
Patrick McHardy	28d2b136ca	net: vlan: announce STAG offload capability in some drivers - macvlan: propagate STAG filtering capabilities from underlying device - ifb: announce STAG tagging support in addition to CTAG tagging support - veth: announce STAG tagging/stripping support in addition to CTAG support Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:46:06 -04:00
Patrick McHardy	f646968f8f	net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_* Rename the hardware VLAN acceleration features to include "CTAG" to indicate that they only support CTAGs. Follow up patches will introduce 802.1ad server provider tagging (STAGs) and require the distinction for hardware not supporting acclerating both. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:45:26 -04:00
Eric Dumazet	f45a5c267d	veth: fix NULL dereference in veth_dellink() commit `d0e2c55e7c` (veth: avoid a NULL deref in veth_stats_one) added another NULL deref in veth_dellink(). # ip link add name veth1 type veth peer name veth0 # rmmod veth We crash because veth_dellink() is called twice, so we must take care of NULL peer. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-10 20:41:43 -05:00
Eric Dumazet	2efd32ee1b	veth: fix a NULL deref in netif_carrier_off In commit `d0e2c55e7c` (veth: avoid a NULL deref in veth_stats_one) we now clear the peer pointers in veth_dellink() veth_close() must therefore make sure the peer pointer is set. Reported-by: Tom Parkin <tom.parkin@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 14:11:46 -08:00
Eric Dumazet	d0e2c55e7c	veth: avoid a NULL deref in veth_stats_one commit `2681128f0c` (veth: extend device features) added a NULL deref in veth_stats_one(), as veth_get_stats64() was not testing if the peer device was setup or not. At init time, we call dev_get_stats() before veth pair is fully setup. [ 178.854758] [<ffffffffa00f5677>] veth_get_stats64+0x47/0x70 [veth] [ 178.861013] [<ffffffff814f0a2d>] dev_get_stats+0x6d/0x130 [ 178.866486] [<ffffffff81504efc>] rtnl_fill_ifinfo+0x47c/0x930 [ 178.872299] [<ffffffff81505b93>] rtmsg_ifinfo+0x83/0x100 [ 178.877678] [<ffffffff81505cc6>] rtnl_configure_link+0x76/0xa0 [ 178.883580] [<ffffffffa00f52fa>] veth_newlink+0x16a/0x350 [veth] [ 178.889654] [<ffffffff815061cc>] rtnl_newlink+0x4dc/0x5e0 [ 178.895128] [<ffffffff81505e1e>] ? rtnl_newlink+0x12e/0x5e0 [ 178.900769] [<ffffffff8150587d>] rtnetlink_rcv_msg+0x11d/0x310 [ 178.906669] [<ffffffff81505760>] ? __rtnl_unlock+0x20/0x20 [ 178.912225] [<ffffffff81521f89>] netlink_rcv_skb+0xa9/0xd0 [ 178.917779] [<ffffffff81502d55>] rtnetlink_rcv+0x25/0x40 [ 178.923159] [<ffffffff815218d1>] netlink_unicast+0x1b1/0x230 [ 178.928887] [<ffffffff81521c4e>] netlink_sendmsg+0x2fe/0x3b0 [ 178.934615] [<ffffffff814dbe22>] sock_sendmsg+0xd2/0xf0 So we must check if peer was setup in veth_get_stats64() As pointed out by Ben Hutchings, priv->peer is missing proper synchronization. Adding RCU protection is a safe and well documented way to make sure we don't access about to be freed or already freed data. Reported-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: Eric Dumazet <edumazet@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-07 19:42:50 -08:00
Eric Dumazet	8093315a91	veth: extend device features veth is lacking most modern facilities, like SG, checksums, TSO. It makes sense to extend dev->features to get them, or GRO aggregation is defeated by a forced segmentation. Reported-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-30 02:31:59 -08:00
Eric Dumazet	2681128f0c	veth: reduce stat overhead veth stats are a bit bloated. There is no need to account transmit and receive stats, since they are absolutely symmetric. Also use a per device atomic64_t for the dropped counter, as it should never be used in fast path. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-30 02:31:58 -08:00
Rami Rosen	c07135633b	rtnelink: remove unused parameter from rtnl_create_link(). This patch removes an unused parameter (src_net) from rtnl_create_link() method and from the method single invocation, in veth. This parameter was used in the past when calling ops->get_tx_queues(src_net, tb) in rtnl_create_link(). The get_tx_queues() member of rtnl_link_ops was replaced by two methods, get_num_tx_queues() and get_num_rx_queues(), which do not get any parameter. This was done in commit `d40156aa5e` by Jiri Pirko ("rtnl: allow to specify different num for rx and tx queue count"). Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-30 12:24:40 -05:00
Hannes Frederic Sowa	23ea5a9637	veth: allow changing the mac address while interface is up Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-01 12:26:10 -04:00
Pavel Emelyanov	e6f8f1a739	veth: Allow to create peer link with given ifindex The ifinfomsg is in there (thanks kaber@ for foreseeing this long time ago), so take the given ifidex and register netdev with it. Ben noticed, that this code path previously ignored ifmp->ifi_index and userland could be passing in garbage. Thus it may now fail occasionally because the value clashes with an existing interface. To address this it's assumed that if the caller specifies the ifindex for the veth master device, then it's aware of this possibility and should explicitly specify (or set to 0 for auto-assignment) the peer's ifindex as well. With this the compatibility with old tools not setting ifindex is preserved. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-09 16:18:07 -07:00
David S. Miller	32efe08d77	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c Small minor conflict in bnx2x, wherein one commit changed how statistics were stored in software, and another commit fixed endianness bugs wrt. reading the values provided by the chip in memory. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-02-19 16:03:15 -05:00
Danny Kukawka	f2cedb63df	net: replace random_ether_addr() with eth_hw_addr_random() Replace usage of random_ether_addr() with eth_hw_addr_random() to set addr_assign_type correctly to NET_ADDR_RANDOM. Change the trivial cases. v2: adapt to renamed eth_hw_addr_random() Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-02-15 15:34:16 -05:00
Thomas Graf	237114384a	veth: Enforce minimum size of VETH_INFO_PEER VETH_INFO_PEER carries struct ifinfomsg plus optional IFLA attributes. A minimal size of sizeof(struct ifinfomsg) must be enforced or we may risk accessing that struct beyond the limits of the netlink message. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-02-15 14:59:20 -05:00
Rick Jones	84b4050111	Sweep away N/A fw_version dustbunnies from the .get_drvinfo routine of a number of drivers Per discussion with Ben Hutchings and David Miller, go through and remove assignments of "N/A" to fw_version in various drivers' .get_drvinfo routines. While there clean-up some use of bare constants and such. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-22 16:43:32 -05:00
Michał Mirosław	34324dc2bf	net: remove NETIF_F_NO_CSUM feature bit Only distinct use is checking if NETIF_F_NOCACHE_COPY should be enabled by default. The check heuristics is altered a bit here, so it hits other people than before. The default shouldn't be trusted for performance-critical cases anyway. For all other uses NETIF_F_NO_CSUM is equivalent to NETIF_F_HW_CSUM. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-16 17:43:12 -05:00
Rick Jones	33a5ba144e	net: sweep-up some straglers in strlcpy conversion of .get_drvinfo routines Convert some remaining straglers' .get_drvinfo routines to use strlcpy rather than strcpy/strncpy. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-16 17:38:55 -05:00
Eric Dumazet	8ce120f118	net: better pcpu data alignment Tunnels can force an alignment of their percpu data to reduce number of cache lines used in fast path, or read in .ndo_get_stats() percpu_alloc() is a very fine grained allocator, so any small hole will be used anyway. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-08 15:10:59 -05:00
Paul Gortmaker	9d9779e723	drivers/net: Add module.h to drivers who were implicitly using it The device.h header was including module.h, making it present for most of these drivers. But we want to clean that up. Call out the include of module.h in the modular network drivers. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:07 -04:00
Neil Horman	550fd08c2c	net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared After the last patch, We are left in a state in which only drivers calling ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real hardware call ether_setup for their net_devices and don't hold any state in their skbs. There are a handful of drivers that violate this assumption of course, and need to be fixed up. This patch identifies those drivers, and marks them as not being able to support the safe transmission of skbs by clearning the IFF_TX_SKB_SHARING flag in priv_flags Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Karsten Keil <isdn@linux-pingi.de> CC: "David S. Miller" <davem@davemloft.net> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Patrick McHardy <kaber@trash.net> CC: Krzysztof Halasa <khc@pm.waw.pl> CC: "John W. Linville" <linville@tuxdriver.com> CC: Greg Kroah-Hartman <gregkh@suse.de> CC: Marcel Holtmann <marcel@holtmann.org> CC: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-27 22:39:30 -07:00
Eric Dumazet	81b16ba2f1	veth: Kill unused tx_dropped Followup to commit `f82528bc13` (Exclude duplicated checking for iface-up) : We no longer need percpu tx_dropped field. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-06 01:51:04 -07:00
David S. Miller	3600cdadb7	veth: Kill unused code label and code block. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-05 23:49:03 -07:00
Alexander Smirnov	f82528bc13	Exclude duplicated checking for iface-up. This flags is checked in 'is_skb_forwardable' function, which is subroutine of 'dev_forward_skb'. Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-30 22:13:37 -07:00
Eric Dumazet	cf05c700cf	veth: fix 64bit stats on 32bit arches Using 64bit stats on 32bit arches must use a synchronization or readers can get transient values. Fixes bug introduced in commit `6311cc44a2` (veth: convert to 64 bit statistics) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-19 22:48:34 -07:00
stephen hemminger	6311cc44a2	veth: convert to 64 bit statistics Not much change, device was already keeping per cpu statistics. Use recent 64 statistics interface. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-08 23:26:32 -07:00
Shan Wei	534ea99b06	net: drivers: kill two unused macro definitions Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Reviewed-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-15 18:01:15 -04:00
David S. Miller	7143b7d412	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/tg3.c	2011-05-05 14:59:02 -07:00
Jiri Pirko	6c8c44462a	Revert: veth: remove unneeded ifname code from veth_newlink() `84c49d8c3e` ("veth: remove unneeded ifname code from veth_newlink()") caused regression on veth creation. This patch reverts the original one. Reported-by: Michał Mirosław <mirqus@gmail.com> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-02 15:54:31 -07:00
David Decotigny	7073949720	ethtool: cosmetic: Use ethtool ethtool_cmd_speed API This updates the network drivers so that they don't access the ethtool_cmd::speed field directly, but use ethtool_cmd_speed() instead. For most of the drivers, these changes are purely cosmetic and don't fix any problem, such as for those 1GbE/10GbE drivers that indirectly call their own ethtool get_settings()/mii_ethtool_gset(). The changes are meant to enforce code consistency and provide robustness with future larger throughputs, at the expense of a few CPU cycles for each ethtool operation. All drivers compiled with make allyesconfig ion x86_64 have been updated. Tested: make allyesconfig on x86_64 + e1000e/bnx2x work Signed-off-by: David Decotigny <decot@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-29 14:03:01 -07:00
Michał Mirosław	a2c725fa39	veth: convert to hw_features This should probably get TSO available as it's basically a loopback device. Offloads are left disabled by default - as before. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-01 20:56:30 -07:00
Eric W. Biederman	675071a2ef	veth: Fix the byte counters Commit `44540960` "veth: move loopback logic to common location" introduced a bug in the packet counters. I don't understand why that happened as it is not explained in the comments and the mut check in dev_forward_skb retains the assumption that skb->len is the total length of the packet. I just measured this emperically by setting up a veth pair between two noop network namespaces setting and attempting a telnet connection between the two. I saw three packets in each direction and the byte counters were exactly 14*3 = 42 bytes high in each direction. I got the actual packet lengths with tcpdump. So remove the extra ETH_HLEN from the veth byte count totals. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-21 18:24:53 -07:00
Jiri Pirko	84c49d8c3e	veth: remove unneeded ifname code from veth_newlink() The code is not needed because tb[IFLA_IFNAME] is already processed in rtnl_newlink(). Remove this redundancy. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-24 23:18:18 -08:00
Michał Mirosław	0b7967503d	net/veth: Fix packet checksumming We can't change ip_summed from CHECKSUM_PARTIAL to CHECKSUM_NONE or CHECKSUM_UNNECESSARY because checksum in packet's headers is not valid and will cause invalid checksum when frame is forwarded. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 14:41:35 -08:00
Eric Dumazet	807540baae	drivers/net: return operator cleanup Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-26 18:34:29 -07:00
Eric Dumazet	6ec82562ff	veth: Dont kfree_skb() after dev_forward_skb() In case of congestion, netif_rx() frees the skb, so we must assume dev_forward_skb() also consume skb. Bug introduced by commit `445409602c` (veth: move loopback logic to common location) We must change dev_forward_skb() to always consume skb, and veth to not double free it. Bug report : http://marc.info/?l=linux-netdev&m=127310770900442&w=3 Reported-by: Martín Ferrari <martin.ferrari@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-06 00:53:53 -07:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Patrick McHardy	3729d50212	rtnetlink: support specifying device flags on device creation commit e8469ed959c373c2ff9e6f488aa5a14971aebe1f Author: Patrick McHardy <kaber@trash.net> Date: Tue Feb 23 20:41:30 2010 +0100 Support specifying the initial device flags when creating a device though rtnl_link. Devices allocated by rtnl_create_link() are marked as INITIALIZING in order to surpress netlink registration notifications. To complete setup, rtnl_configure_link() must be called, which performs the device flag changes and invokes the deferred notifiers if everything went well. Two examples: # add macvlan to eth0 # $ ip link add link eth0 up allmulticast on type macvlan [LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether 26:f8:84:02:f9:2a brd ff:ff:ff:ff:ff:ff [ROUTE]ff00::/8 dev macvlan0 table local metric 256 mtu 1500 advmss 1440 hoplimit 0 [ROUTE]fe80::/64 dev macvlan0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0 [LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 link/ether 26:f8:84:02:f9:2a [ADDR]11: macvlan0 inet6 fe80::24f8:84ff:fe02:f92a/64 scope link valid_lft forever preferred_lft forever [ROUTE]local fe80::24f8:84ff:fe02:f92a via :: dev lo table local proto none metric 0 mtu 16436 advmss 16376 hoplimit 0 [ROUTE]default via fe80::215:e9ff:fef0:10f8 dev macvlan0 proto kernel metric 1024 mtu 1500 advmss 1440 hoplimit 0 [NEIGH]fe80::215:e9ff:fef0:10f8 dev macvlan0 lladdr 00:15:e9:f0:10:f8 router STALE [ROUTE]2001:6f8:974::/64 dev macvlan0 proto kernel metric 256 expires 0sec mtu 1500 advmss 1440 hoplimit 0 [PREFIX]prefix 2001:6f8:974::/64 dev macvlan0 onlink autoconf valid 14400 preferred 131084 [ADDR]11: macvlan0 inet6 2001:6f8:974:0:24f8:84ff:fe02:f92a/64 scope global dynamic valid_lft 86399sec preferred_lft 14399sec # add VLAN to eth1, eth1 is down # $ ip link add link eth1 up type vlan id 1000 RTNETLINK answers: Network is down <no events> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-27 02:43:40 -08:00
Tejun Heo	47d742752d	percpu: add __percpu sparse annotations to net drivers Add __percpu sparse annotations to net drivers. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-16 23:05:38 -08:00
Linus Torvalds	d0316554d3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits) m68k: rename global variable vmalloc_end to m68k_vmalloc_end percpu: add missing per_cpu_ptr_to_phys() definition for UP percpu: Fix kdump failure if booted with percpu_alloc=page percpu: make misc percpu symbols unique percpu: make percpu symbols in ia64 unique percpu: make percpu symbols in powerpc unique percpu: make percpu symbols in x86 unique percpu: make percpu symbols in xen unique percpu: make percpu symbols in cpufreq unique percpu: make percpu symbols in oprofile unique percpu: make percpu symbols in tracer unique percpu: make percpu symbols under kernel/ and mm/ unique percpu: remove some sparse warnings percpu: make alloc_percpu() handle array types vmalloc: fix use of non-existent percpu variable in put_cpu_var() this_cpu: Use this_cpu_xx in trace_functions_graph.c this_cpu: Use this_cpu_xx for ftrace this_cpu: Use this_cpu_xx in nmi handling this_cpu: Use this_cpu operations in RCU this_cpu: Use this_cpu ops for VM statistics ... Fix up trivial (famous last words) global per-cpu naming conflicts in arch/x86/kvm/svm.c mm/slab.c	2009-12-14 09:58:24 -08:00
David S. Miller	9b963e5d0e	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/ieee802154/fakehard.c drivers/net/e1000e/ich8lan.c drivers/net/e1000e/phy.c drivers/net/netxen/netxen_nic_init.c drivers/net/wireless/ath/ath9k/main.c	2009-11-29 00:57:15 -08:00
Arnd Bergmann	445409602c	veth: move loopback logic to common location The veth driver contains code to forward an skb from the start_xmit function of one network device into the receive path of another device. Moving that code into a common location lets us reuse the code for direct forwarding of data between macvlan ports, and possibly in other drivers. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-26 15:52:58 -08:00
Eric Dumazet	2b1c8b0f92	veth: Fix veth_get_stats() veth_get_stats() can be called in parallel on several cpus. It's better to not reset dev->stats as it could give wrong result on one cpu. Use temporary variables, then store the final results. Also, we should loop on every possible cpus, not only online cpus, or cpu hotplug can suddenly give wrong veth stats. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-19 13:16:22 -08:00
Eric W. Biederman	81adee47df	net: Support specifying the network namespace upon device creation. There is no good reason to not support userspace specifying the network namespace during device creation, and it makes it easier to create a network device and pass it to a child network namespace with a well known name. We have to be careful to ensure that the target network namespace for the new device exists through the life of the call. To keep that logic clear I have factored out the network namespace grabbing logic into rtnl_link_get_net. In addtion we need to continue to pass the source network namespace to the rtnl_link_ops.newlink method so that we can find the base device source network namespace. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>	2009-11-08 00:53:51 -08:00
Eric Dumazet	24540535d3	veth: Fix veth_dellink method In commit `23289a37e2` (net: add a list_head parameter to dellink() method), I forgot to actually use this parameter in veth_dellink. I remember feeling a bit uncomfortable about veth_close(), because it does : netif_carrier_off(dev); netif_carrier_off(priv->peer); Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-30 01:00:27 -07:00
Eric Dumazet	23289a37e2	net: add a list_head parameter to dellink() method Adding a list_head parameter to rtnl_link_ops->dellink() methods allow us to queue devices on a list, in order to dismantle them all at once. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-28 02:22:07 -07:00
Christoph Lameter	e7dcaa4755	this_cpu: Eliminate get/put_cpu There are cases where we can use this_cpu_ptr and as the result of using this_cpu_ptr() we no longer need to determine the currently executing cpu. In those places no get/put_cpu combination is needed anymore. The local cpu variable can be eliminated. Preemption still needs to be disabled and enabled since the modifications of the per cpu variables is not atomic. There may be multiple per cpu variables modified and those must all be from the same processor. Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Tejun Heo <tj@kernel.org> cc: Eric Biederman <ebiederm@aristanetworks.com> cc: Stephen Hemminger <shemminger@vyatta.com> cc: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2009-10-03 19:48:23 +09:00
Stephen Hemminger	0fc0b732ea	netdev: drivers should make ethtool_ops const No need to put ethtool_ops in data, they should be const. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-02 01:03:33 -07:00
Stephen Hemminger	424efe9caf	netdev: convert pseudo drivers to netdev_tx_t These are all drivers that don't touch real hardware. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-01 01:13:40 -07:00
Ben Greear	27a242e92f	veth: Zero timestamp in xmit path. This patch zero's the timestamp before handing the packet to the peer interface. This lets the peer recalculate the rx timestamp if it cares about timestamps. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-07-23 18:01:02 -07:00
Patrick McHardy	6ed106549d	net: use NETDEV_TX_OK instead of 0 in ndo_start_xmit() functions This patch is the result of an automatic spatch transformation to convert all ndo_start_xmit() return values of 0 to NETDEV_TX_OK. Some occurences are missed by the automatic conversion, those will be handled in a seperate patch. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-07-05 19:16:04 -07:00
David S. Miller	11687a1099	Revert "veth: prevent oops caused by netdev destructor" This reverts commit `ae0e8e8220`. This change had two problems: 1) Since it frees the stats in the drivers' close method, we can OOPS in the transmit routine. 2) stats are no longer remembered across ifdown/ifup which disagrees with how every other device operates. Thanks to analysis and test patch from Serge E. Hallyn and initial OOPS report by Sachin Sant. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-25 02:45:42 -07:00
Eric Dumazet	60df914e29	veth: dont release skb->dst in veth_xmit() No need to release skb->dst, its now done by core network. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-29 15:15:30 -07:00
Stephen Hemminger	ae0e8e8220	veth: prevent oops caused by netdev destructor From: Stephen Hemminger <shemminger@vyatta.com> The veth driver will oops if sysfs hooks are open while module is removed. The net device destructor can not point to code in a module; basically there are only two possible safe values: NULL - no destructor, or free_netdev - free on last use Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-27 03:04:58 -07:00
Eric Biederman	38d408152a	veth: Allow setting the L3 MTU The limitation to only 1500 byte mtu's limits the utility of the veth device for testing routing. So implement implement a configurable MTU. For consistency I drop packets on the receive side when they are larger than the MTU. I count those drops. And I allow a little padding for vlan headers. I also test the mtu when a new device is created with netlink because that path currently bypasses the current mtu setting code. Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-03 23:36:04 -08:00
Eric W. Biederman	2cf48a10aa	veth: Fix carrier detect The current implementation of carrier detect in veth is broken. It reports the link is down until both sides of the veth pair are administatively up and then forever after it reports link up. So fix veth so that it only reports link up when both interfaces of the pair are administratively up. Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-01 20:44:21 -08:00
Daniel Lezcano	ee92362317	veth : add the set_mac_address capability Fix lost set_mac_address capability. Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-22 00:04:45 -08:00
Stephen Hemminger	008298231a	netdev: add more functions to netdevice ops This patch moves neigh_setup and hard_start_xmit into the network device ops structure. For bisection, fix all the previously converted drivers as well. Bonding driver took the biggest hit on this. Added a prefetch of the hard_start_xmit in the fast path to try and reduce any impact this would have. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-20 20:14:53 -08:00
Stephen Hemminger	4456e7bdf7	veth: convert to net_device_ops Convert to net_device_ops function table. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-19 22:42:39 -08:00
Daniel Lezcano	3717746ef8	veth: remove unused list The veth network device is stored in a list in the netdev private. AFAICS, this list is never used so I removed this list from the code. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-28 23:02:33 -07:00
Daniel Lezcano	bb7bba3d56	veth: Remove useless veth field The veth private structure contains a netdev pointer refering to its peer. This field is never used and it is pointless because if we can access, the veth_priv, that means we already have the netdev which is stored in veth_priv->dev. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-28 23:02:33 -07:00
YOSHIFUJI Hideaki	c346dca108	[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS. Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:39:53 +09:00
Daniel Lezcano	c15853f2c1	veth: fix dev refcount race When deleting the veth driver, veth_close calls netif_carrier_off for the two extremities of the network device. netif_carrier_off on the peer device will fire an event and hold a reference on the peer device. Just after, the peer is unregistered taking the rtnl_lock while the linkwatch_event is scheduled. If __linkwatch_run_queue does not occurs before the unregistering, unregister_netdevice will wait for the dev refcount to reach zero holding the rtnl_lock and linkwatch_event will wait for the rtnl_lock and hold the dev refcount. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-20 00:21:47 -08:00
Patrick McHardy	68365458a4	[NET]: rtnl_link: fix use-after-free When unregistering the rtnl_link_ops, all existing devices using the ops are destroyed. With nested devices this may lead to a use-after-free despite the use of for_each_netdev_safe() in case the upper device is next in the device list and is destroyed by the NETDEV_UNREGISTER notifier. The easy fix is to restart scanning the device list after removing a device. Alternatively we could add new devices to the front of the list to avoid having dependant devices follow the device they depend on. A third option would be to only restart scanning if dev->iflink of the next device matches dev->ifindex of the current one. For now this seems like the safest solution. With this patch, the veth rtnl_link_ops unregistration can use rtnl_link_unregister() directly since it now also handles destruction of multiple devices at once. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-20 20:31:45 -08:00
Stephen Hemminger	ecef969e5b	[VETH]: move veth.h to include/linux Move veth.h from net/ to linux/ since it is a user api, and add it to user header processing Kbuild. [ Use header-y as suggested by Sam Ravnborg. -DaveM ] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-26 19:36:35 -08:00
Jeff Garzik	b9f2c0440d	[netdrvr] Stop using legacy hooks ->self_test_count, ->get_stats_count These have been superceded by the new ->get_sset_count() hook. Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:45 -07:00
Eric W. Biederman	881d966b48	[NET]: Make the device list and device lookups per namespace. This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:10 -07:00
Pavel Emelyanov	e314dbdc1c	[NET]: Virtual ethernet device driver. Veth stands for Virtual ETHernet. It is a simple tunnel driver that works at the link layer and looks like a pair of ethernet devices interconnected with each other. Mainly it allows to communicate between network namespaces but it can be used as is as well. The newlink callback is organized that way to make it easy to create the peer device in the separate namespace when we have them in kernel. This implementation uses another interface - the RTM_NRELINK message introduced by Patric. Bug fixes from Daniel Lezcano. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:46 -07:00

1 2 3

126 Commits