OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Arnaldo Carvalho de Melo	ebb53d7565	[NET] proto: Use pcounters for the inuse field Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:40 -08:00
Pavel Emelyanov	9859a79023	[NET]: Compact sk_stream_mem_schedule() code This function references sk->sk_prot->xxx for many times. It turned out, that there's so many code in it, that gcc cannot always optimize access to sk->sk_prot's fields. After saving the sk->sk_prot on the stack and comparing disassembled code, it turned out that the function became ~10 bytes shorter and made less dereferences (on i386 and x86_64). Stack consumption didn't grow. Besides, this patch drives most of this function into the 80 columns limit. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:36 -08:00
Benjamin Thery	3ef1355dcb	[NET]: Make netns cleanup to run in a separate queue This patch adds a separate workqueue for cleaning up a network namespace. If we use the keventd workqueue to execute cleanup_net(), there is a problem to unregister devices in IPv6. Indeed the code that cleans up also schedule work in keventd: as long as cleanup_net() hasn't return, dst_gc_task() cannot run and as long as dst_gc_task() has not run, there are still some references pending on the net devices and cleanup_net() can not unregister and exit the keventd workqueue. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Denis V. Lunev <den@openvz.org> Acked-By: Kirill Korotaev <dev@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:35 -08:00
Adrian Bunk	02d45827fa	[NET] net/core/request_sock.c: Remove unused exports. This patch removes the following unused EXPORT_SYMBOL's: - reqsk_queue_alloc - __reqsk_queue_destroy - reqsk_queue_destroy Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:33 -08:00
Denis V. Lunev	e372c41401	[NET]: Consolidate net namespace related proc files creation. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:28 -08:00
Eric W. Biederman	4b3da706bb	[NET]: Make the netlink methods in rtnetlink handle multiple network namespaces After the previous prep work this just consists of removing checks limiting the code to work in the initial network namespace, and updating rtmsg_ifinfo so we can generate events for devices in something other then the initial network namespace. Referring to network other network devices like the IFLA_LINK and IFLA_MASTER attributes do, gets interesting if those network devices happen to be in other network namespaces. Currently ifindex numbers are allocated globally so I have taken the path of least resistance and not still report the information even though the devices they are talking about are invisible. If applications start getting confused or when ifindex numbers become local to the network namespace we may need to do something different in the future. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Denis V. Lunev <den@openz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:26 -08:00
Denis V. Lunev	97c53cacf0	[NET]: Make rtnetlink infrastructure network namespace aware (v3) After this patch none of the netlink callback support anything except the initial network namespace but the rtnetlink infrastructure now handles multiple network namespaces. Changes from v2: - IPv6 addrlabel processing Changes from v1: - no need for special rtnl_unlock handling - fixed IPv6 ndisc Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:25 -08:00
Denis V. Lunev	b854272b3c	[NET]: Modify all rtnetlink methods to only work in the initial namespace (v2) Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. Changes from v1: - added IPv6 addrlabel protection Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2008-01-28 14:54:24 -08:00
Stephen Hemminger	c7b6ea24b4	[NETPOLL]: Don't need rx_flags. The rx_flags variable is redundant. Turning rx on/off is done via setting the rx_np pointer. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:18 -08:00
Stephen Hemminger	33f807ba0d	[NETPOLL]: Kill NETPOLL_RX_DROP, set but never tested. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:18 -08:00
Stephen Hemminger	0953864160	[NETPOLL]: no need to store local_mac The local_mac is managed by the network device, no need to keep a spare copy and all the management problems that could cause. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:17 -08:00
Stephen Hemminger	5106930bd6	[NETPOLL]: netpoll_poll() cleanup Restructure code slightly to improve readability: * dereference device once * change obvious while() loop * let poll_napi() handle null list itself Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:16 -08:00
Stephen Hemminger	0adc9add77	[NETPOLL]: Use skb_queue_purge(). Use standard routine for flushing queue. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:16 -08:00
Oliver Hartkopp	cd05acfe65	[CAN]: Allocate protocol numbers for PF_CAN This patch adds a protocol/address family number, ARP hardware type, ethernet packet type, and a line discipline number for the SocketCAN implementation. Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de> Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:09 -08:00
Pavel Emelyanov	c0ef877b2c	[NET]: Move sock_valbool_flag to socket.c The sock_valbool_flag() helper is used in setsockopt to set or reset some flag on the sock. This helper is required in the net/socket.c only, so move it there. Besides, patch two places in sys_setsockopt() that repeat this helper functionality manually. Since this is not a bugfix, but a trivial cleanup, I prepared this patch against net-2.6.25, but it also applies (with a single offset) to the latest net-2.6. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:00 -08:00
Herbert Xu	352e512c32	[NET]: Eliminate duplicate copies of dst_discard We have a number of copies of dst_discard scattered around the place which all do the same thing, namely free a packet on the input or output paths. This patch deletes all of them except dst_discard and points all the users to it. The only non-trivial bit is decnet where it returns an error. However, conceptually this is identical to the blackhole functions used in IPv4 and IPv6 which do not return errors. So they should either all return errors or all return zero. For now I've stuck with the majority and picked zero as the return value. It doesn't really matter in practice since few if any driver would react differently depending on a zero return value or NET_RX_DROP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:37 -08:00
Pavel Emelyanov	b24b8a247f	[NET]: Convert init_timer into setup_timer Many-many code in the kernel initialized the timer->function and timer->data together with calling init_timer(timer). There is already a helper for this. Use it for networking code. The patch is HUGE, but makes the code 130 lines shorter (98 insertions(+), 228 deletions(-)). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:35 -08:00
Wang Chen	33c732c361	[IPV4]: Add raw drops counter. Add raw drops counter for IPv4 in /proc/net/raw . Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:33 -08:00
Jens Axboe	9c55e01c0c	[TCP]: Splice receive support. Support for network splice receive. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:31 -08:00
Gautham R Shenoy	86ef5c9a8e	cpu-hotplug: replace lock_cpu_hotplug() with get_online_cpus() Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use get_online_cpus and put_online_cpus instead as it highlights the refcount semantics in these operations. The new API guarantees protection against the cpu-hotplug operation, but it doesn't guarantee serialized access to any of the local data structures. Hence the changes needs to be reviewed. In case of pseries_add_processor/pseries_remove_processor, use cpu_maps_update_begin()/cpu_maps_update_done() as we're modifying the cpu_present_map there. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:08:02 +01:00
Denis V. Lunev	ff4b950277	[NETNS]: Re-export init_net via EXPORT_SYMBOL. init_net is used added as a parameter to a lot of old API calls, f.e. ip_dev_find. These calls were exported as EXPORT_SYMBOL. So, export init_net as EXPORT_SYMBOL to keep networking API consistent. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-23 03:11:42 -08:00
Patrick McHardy	68365458a4	[NET]: rtnl_link: fix use-after-free When unregistering the rtnl_link_ops, all existing devices using the ops are destroyed. With nested devices this may lead to a use-after-free despite the use of for_each_netdev_safe() in case the upper device is next in the device list and is destroyed by the NETDEV_UNREGISTER notifier. The easy fix is to restart scanning the device list after removing a device. Alternatively we could add new devices to the front of the list to avoid having dependant devices follow the device they depend on. A third option would be to only restart scanning if dev->iflink of the next device matches dev->ifindex of the current one. For now this seems like the safest solution. With this patch, the veth rtnl_link_ops unregistration can use rtnl_link_unregister() directly since it now also handles destruction of multiple devices at once. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-20 20:31:45 -08:00
David S. Miller	cecbb63967	[NEIGH]: Revert 'Fix race between neigh_parms_release and neightbl_fill_parms' Commit `9cd4002942` (Fix race between neigh_parms_release and neightbl_fill_parms) introduced device reference counting regressions for several people, see: http://bugzilla.kernel.org/show_bug.cgi?id=9778 for example. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-20 20:31:42 -08:00
Pavel Emelyanov	9cd4002942	[NEIGH]: Fix race between neigh_parms_release and neightbl_fill_parms The neightbl_fill_parms() is called under the write-locked tbl->lock and accesses the parms->dev. The negh_parm_release() calls the dev_put(parms->dev) without this lock. This creates a tiny race window on which the parms contains potentially stale dev pointer. To fix this race it's enough to move the dev_put() upper under the tbl->lock, but note, that the parms are held by neighbors and thus can live after the neigh_parms_release() is called, so we still can have a parm with bad dev pointer. I didn't find where the neigh->parms->dev is accessed, but still think that putting the dev is to be done in a place, where the parms are really freed. Am I right with that? Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-10 03:48:38 -08:00
Paul Moore	02f1c89d6e	[NET]: Clone the sk_buff 'iif' field in __skb_clone() Both NetLabel and SELinux (other LSMs may grow to use it as well) rely on the 'iif' field to determine the receiving network interface of inbound packets. Unfortunately, at present this field is not preserved across a skb clone operation which can lead to garbage values if the cloned skb is sent back through the network stack. This patch corrects this problem by properly copying the 'iif' field in __skb_clone() and removing the 'iif' field assignment from skb_act_clone() since it is no longer needed. Also, while we are here, put the assignments in the same order as the offsets to reduce cacheline bounces. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-08 23:30:17 -08:00
David S. Miller	fed17f3094	[NET]: Stop polling when napi_disable() is pending. This finally adds the code in net_rx_action() to break out of the ->poll()'ing loop when a napi_disable() is found to be pending. Now, even if a device is being flooded with packets it can be cleanly brought down. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-08 23:30:13 -08:00
Wei Yongjun	1ac70e7ad2	[NET]: Fix function put_cmsg() which may cause usr application memory overflow When used function put_cmsg() to copy kernel information to user application memory, if the memory length given by user application is not enough, by the bad length calculate of msg.msg_controllen, put_cmsg() function may cause the msg.msg_controllen to be a large value, such as 0xFFFFFFF0, so the following put_cmsg() can also write data to usr application memory even usr has no valid memory to store this. This may cause usr application memory overflow. int put_cmsg(struct msghdr * msg, int level, int type, int len, void data) { struct cmsghdr __user cm = (__force struct cmsghdr __user )msg->msg_control; struct cmsghdr cmhdr; int cmlen = CMSG_LEN(len); ~~~~~~~~~~~~~~~~~~~~~ int err; if (MSG_CMSG_COMPAT & msg->msg_flags) return put_cmsg_compat(msg, level, type, len, data); if (cm==NULL \|\| msg->msg_controllen < sizeof(cm)) { msg->msg_flags \|= MSG_CTRUNC; return 0; /* XXX: return error? check spec. */ } if (msg->msg_controllen < cmlen) { ~~~~~~~~~~~~~~~~~~~~~~~~ msg->msg_flags \|= MSG_CTRUNC; cmlen = msg->msg_controllen; } cmhdr.cmsg_level = level; cmhdr.cmsg_type = type; cmhdr.cmsg_len = cmlen; err = -EFAULT; if (copy_to_user(cm, &cmhdr, sizeof cmhdr)) goto out; if (copy_to_user(CMSG_DATA(cm), data, cmlen - sizeof(struct cmsghdr))) goto out; cmlen = CMSG_SPACE(len); ~~~~~~~~~~~~~~~~~~~~~~~~~~~ If MSG_CTRUNC flags is set, msg->msg_controllen is less than CMSG_SPACE(len), "msg->msg_controllen -= cmlen" will cause unsinged int type msg->msg_controllen to be a large value. ~~~~~~~~~~~~~~~~~~~~~~~~~~~ msg->msg_control += cmlen; msg->msg_controllen -= cmlen; ~~~~~~~~~~~~~~~~~~~~~ err = 0; out: return err; } The same promble exists in put_cmsg_compat(). This patch can fix this problem. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-20 14:36:44 -08:00
Joe Perches	53ccaae1ef	[NET] net/core/: Spelling fixes Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-20 14:02:06 -08:00
Wang Chen	d59b54b150	[NET]: Fix wrong comments for unregister_net* There are some return value comments for void functions. Fixed it. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-11 02:45:32 -08:00
Herbert Xu	2d4baff8da	[SKBUFF]: Free old skb properly in skb_morph The skb_morph function only freed the data part of the dst skb, but leaked the auxiliary data such as the netfilter fields. This patch fixes this by moving the relevant parts from __kfree_skb to skb_release_all and calling it in skb_morph. It also makes kfree_skbmem static since it's no longer called anywhere else and it now no longer does skb_release_data. Thanks to Yasuyuki KOZAKAI for finding this problem and posting a patch for it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-11-26 23:11:19 +08:00
Pavel Emelyanov	1f8170b0ec	[PKTGEN]: Fix double unlock of xfrm_state->lock The pktgen_output_ipsec() function can unlock this lock twice due to merged error and plain paths. Remove one of the calls to spin_unlock. Other possible solution would be to place "return 0" right after the first unlock, but at this place the err is known to be 0, so these solutions are the same except for this one makes the code shorter. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-19 22:51:24 -08:00
Pavel Emelyanov	dab6ba3688	[INET]: Fix potential kfree on vmalloc-ed area of request_sock_queue The request_sock_queue's listen_opt is either vmalloc-ed or kmalloc-ed depending on the number of table entries. Thus it is expected to be handled properly on free, which is done in the reqsk_queue_destroy(). However the error path in inet_csk_listen_start() calls the lite version of reqsk_queue_destroy, called __reqsk_queue_destroy, which calls the kfree unconditionally. Fix this and move the __reqsk_queue_destroy into a .c file as it looks too big to be inline. As David also noticed, this is an error recovery path only, so no locking is required and the lopt is known to be not NULL. reqsk_queue_yank_listen_sk is also now only used in net/core/request_sock.c so we should move it there too. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-15 02:57:06 -08:00
Pavel Emelyanov	c67625a1ec	[NET]: Remove notifier block from chain when register_netdevice_notifier fails Commit fcc5a03ac42564e9e255c1134dda47442289e466: [NET]: Allow netdev REGISTER/CHANGENAME events to fail makes the register_netdevice_notifier() handle the error from the NETDEV_REGISTER event, sent to the registering block. The bad news is that in this case the notifier block is not removed from the list, but the error is returned to the caller. In case the caller is in module init function and handles this error this can abort the module loading. The notifier block will be then removed from the kernel, but will be left in the list. Oops :( I think that the notifier block should be removed from the chain in case of error, regardless whether this error is handled by the caller or not. In the worst case (the error is _not_ handled) module will not receive the events any longer. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-14 15:53:16 -08:00
Denis V. Lunev	022cbae611	[NET]: Move unneeded data to initdata section. This patch reverts Eric's commit `2b008b0a8e` It diets .text & .data section of the kernel if CONFIG_NET_NS is not set. This is safe after list operations cleanup. Signed-of-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-13 03:23:50 -08:00
Denis V. Lunev	ed160e839d	[NET]: Cleanup pernet operation without CONFIG_NET_NS If CONFIG_NET_NS is not set, the only namespace is possible. This patch removes list of pernet_operations and cleanups code a bit. This list is not needed if there are no namespaces. We should just call ->init method. Additionally, the ->exit will be called on module unloading only. This case is safe - the code is not discarded. For the in/kernel code, ->exit should never be called. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-13 03:23:21 -08:00
Adrian Bunk	6aed42159d	[NET]: Unexport sysctl_{r,w}mem_max. sysctl_{r,w}mem_max can now be unexported. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-12 21:24:14 -08:00
Denis V. Lunev	2994c63863	[INET]: Small possible memory leak in FIB rules This patch fixes a small memory leak. Default fib rules can be deleted by the user if the rule does not carry FIB_RULE_PERMANENT flag, f.e. by ip rule flush Such a rule will not be freed as the ref-counter has 2 on start and becomes clearly unreachable after removal. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 22:12:03 -08:00
Alexey Dobriyan	33d36bb83c	[NETNS]: init dev_base_lock only once * it already statically initialized * reinitializing live global spinlock every time netns is setup is also wrong Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 22:09:25 -08:00
Joe Perches	e9671fcb3b	[NET]: Fix infinite loop in dev_mc_unsync(). From: Joe Perches <joe@perches.com> Based upon an initial patch and report by Luis R. Rodriguez. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:36:04 -08:00
Pavel Emelyanov	b733c007ed	[NET]: Clean proto_(un)register from in-code ifdefs The struct proto has the per-cpu "inuse" counter, which is handled with a special care. All the handling code hides under the ifdef CONFIG_SMP and it introduces some code duplication and makes it look worse than it could. Clean this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:04 -08:00
Johann Felix Soden	45a19b0a72	[NETNS]: Fix compiler error in net_namespace.c Because net_free is called by copy_net_ns before its declaration, the compiler gives an error. This patch puts net_free before copy_net_ns to fix this. The compiler error: net/core/net_namespace.c: In function 'copy_net_ns': net/core/net_namespace.c:97: error: implicit declaration of function 'net_free' net/core/net_namespace.c: At top level: net/core/net_namespace.c:104: warning: conflicting types for 'net_free' net/core/net_namespace.c:104: error: static declaration of 'net_free' follows non-static declaration net/core/net_namespace.c:97: error: previous implicit declaration of 'net_free' was here The error was introduced by the '[NET]: Hide the dead code in the net_namespace.c' patch (`6a1a3b9f68`). Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:02 -08:00
Jiri Olsa	40208d71e0	[NET]: Removing duplicit #includes Removing duplicit #includes for net/ Signed-off-by: Jiri Olsa <olsajiri@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:11:44 -08:00
Eric Dumazet	286ab3d460	[NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way. "struct proto" currently uses an array stats[NR_CPUS] to track change on 'inuse' sockets per protocol. If NR_CPUS is big, this means we use a big memory area for this. Moreover, all this memory area is located on a single node on NUMA machines, increasing memory pressure on the boot node. In this patch, I tried to : - Keep a fast !CONFIG_SMP implementation - Keep a fast CONFIG_SMP implementation for often used protocols (tcp,udp,raw,...) - Introduce a NUMA efficient implementation Some helper macros are defined in include/net/sock.h These macros take into account CONFIG_SMP If a "struct proto" is declared without using DEFINE_PROTO_INUSE / REF_PROTO_INUSE macros, it will automatically use a default implementation, using a dynamically allocated percpu zone. This default implementation will be NUMA efficient, but might use 32/64 bytes per possible cpu because of current alloc_percpu() implementation. However it still should be better than previous implementation based on stats[NR_CPUS] field. When a "struct proto" is changed to use the new macros, we use a single static "int" percpu variable, lowering the memory and cpu costs, still preserving NUMA efficiency. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:57 -08:00
Alexey Dobriyan	3f192b5c58	[NET]: Remove /proc/net/stat/_arp_cache upon module removal neigh_table_init_no_netlink() creates them, but they aren't removed anywhere. Steps to reproduce: modprobe clip rmmod clip cat /proc/net/stat/clip_arp_cache BUG: unable to handle kernel paging request at virtual address f89d7758 printing eip: c05a99da pdpt = 0000000000004001 pde = 0000000004408067 pte = 0000000000000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: atm af_packet ipv6 binfmt_misc sbs sbshc fan dock battery backlight ac power_supply parport loop rtc_cmos rtc_core rtc_lib serio_raw button k8temp hwmon amd_rng sr_mod cdrom shpchp pci_hotplug ehci_hcd ohci_hcd uhci_hcd usbcore Pid: 2082, comm: cat Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56-bloat #4) EIP: 0060:[<c05a99da>] EFLAGS: 00210256 CPU: 0 EIP is at neigh_stat_seq_next+0x26/0x3f EAX: 00000001 EBX: f89d7600 ECX: c587bf40 EDX: 00000000 ESI: 00000000 EDI: 00000001 EBP: 00000400 ESP: c587bf1c DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process cat (pid: 2082, ti=c587b000 task=c5984e10 task.ti=c587b000) Stack: c06228cc c5313790 c049e5c0 0804f000 c45a7b00 c53137b0 00000000 00000000 00000082 00000001 00000000 00000000 00000000 fffffffb c58d6780 c049e437 c45a7b00 c04b1f93 c587bfa0 00000400 0804f000 00000400 0804f000 c04b1f2f Call Trace: [<c049e5c0>] seq_read+0x189/0x281 [<c049e437>] seq_read+0x0/0x281 [<c04b1f93>] proc_reg_read+0x64/0x77 [<c04b1f2f>] proc_reg_read+0x0/0x77 [<c048907e>] vfs_read+0x80/0xd1 [<c0489491>] sys_read+0x41/0x67 [<c04080fa>] sysenter_past_esp+0x6b/0xc1 ======================= Code: e9 ec 8d 05 00 56 8b 11 53 8b 40 70 8b 58 3c eb 29 0f a3 15 80 91 7b c0 19 c0 85 c0 8d 42 01 74 17 89 c6 c1 fe 1f 89 01 89 71 04 <8b> 83 58 01 00 00 f7 d0 8b 04 90 eb 09 89 c2 83 fa 01 7e d2 31 EIP: [<c05a99da>] neigh_stat_seq_next+0x26/0x3f SS:ESP 0068:c587bf1c Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:53 -08:00
Jens Axboe	c46f2334c8	[SG] Get rid of __sg_mark_end() sg_mark_end() overwrites the page_link information, but all users want __sg_mark_end() behaviour where we just set the end bit. That is the most natural way to use the sg list, since you'll fill it in and then mark the end point. So change sg_mark_end() to only set the termination bit. Add a sg_magic debug check as well, and clear a chain pointer if it is set. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-11-02 08:47:06 +01:00
Stephen Hemminger	3b582cc14c	[NET]: docbook fixes for netif_ functions Documentation updates for network interfaces. 1. Add doc for netif_napi_add 2. Remove doc for unused returns from netif_rx 3. Add doc for netif_receive_skb [ Incorporated minor mods from Randy Dunlap -DaveM ] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 02:21:47 -07:00
Pavel Emelyanov	d57a9212e0	[NET]: Hide the net_ns kmem cache This cache is only required to create new namespaces, but we won't have them in CONFIG_NET_NS=n case. Hide it under the appropriate ifdef. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:46:50 -07:00
Pavel Emelyanov	1a2ee93d28	[NET]: Mark the setup_net as __net_init The setup_net is called for the init net namespace only (int the CONFIG_NET_NS=n of course) from the __init function, so mark it as __net_init to disappear with the caller after the boot. Yet again, in the perfect world this has to be under #ifdef CONFIG_NET_NS, but it isn't guaranteed that every subsystem is registered after the init_net_ns is set up. After we are sure, that we don't start registering them before the init net setup, we'll be able to move this code under the ifdef. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:45:59 -07:00
Pavel Emelyanov	6a1a3b9f68	[NET]: Hide the dead code in the net_namespace.c The namespace creation/destruction code is never called if the CONFIG_NET_NS is n, so it's OK to move it under appropriate ifdef. The copy_net_ns() in the "n" case checks for flags and returns -EINVAL when new net ns is requested. In a perfect world this stub must be in net_namespace.h, but this function need to know the CLONE_NEWNET value and thus requires sched.h. On the other hand this header is to be injected into almost every .c file in the networking code, and making all this code depend on the sched.h is a suicidal attempt. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:44:50 -07:00
Pavel Emelyanov	1dba323b3f	[NETNS]: Make the init/exit hooks checks outside the loop When the new pernet something (subsys, device or operations) is being registered, the init callback is to be called for each namespace, that currently exitst in the system. During the unregister, the same is to be done with the exit callback. However, not every pernet something has both calls, but the check for the appropriate pointer to be not NULL is performed inside the for_each_net() loop. This is (at least) strange, so tune this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:42:43 -07:00

1 2 3 4 5 ...

799 Commits