OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Eric Dumazet	d826eb14ec	ipv4: PKTINFO doesnt need dst reference Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit : > At least, in recent kernels we dont change dst->refcnt in forwarding > patch (usinf NOREF skb->dst) > > One particular point is the atomic_inc(dst->refcnt) we have to perform > when queuing an UDP packet if socket asked PKTINFO stuff (for example a > typical DNS server has to setup this option) > > I have one patch somewhere that stores the information in skb->cb[] and > avoid the atomic_{inc\|dec}(dst->refcnt). > OK I found it, I did some extra tests and believe its ready. [PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference When a socket uses IP_PKTINFO notifications, we currently force a dst reference for each received skb. Reader has to access dst to get needed information (rt_iif & rt_spec_dst) and must release dst reference. We also forced a dst reference if skb was put in socket backlog, even without IP_PKTINFO handling. This happens under stress/load. We can instead store the needed information in skb->cb[], so that only softirq handler really access dst, improving cache hit ratios. This removes two atomic operations per packet, and false sharing as well. On a benchmark using a mono threaded receiver (doing only recvmsg() calls), I can reach 720.000 pps instead of 570.000 pps. IP_PKTINFO is typically used by DNS servers, and any multihomed aware UDP application. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-09 16:36:27 -05:00
Paul Gortmaker	bc3b2d7fb9	net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules These files are non modular, but need to export symbols using the macros now living in export.h -- call out the include so that things won't break when we remove the implicit presence of module.h from everywhere. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:30:30 -04:00
Steffen Klassert	299b076764	ipv6: Fix IPsec slowpath fragmentation problem ip6_append_data() builds packets based on the mtu from dst_mtu(rt->dst.path). On IPsec the effective mtu is lower because we need to add the protocol headers and trailers later when we do the IPsec transformations. So after the IPsec transformations the packet might be too big, which leads to a slowpath fragmentation then. This patch fixes this by building the packets based on the lower IPsec mtu from dst_mtu(&rt->dst) and adapts the exthdr handling to this. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-18 23:53:10 -04:00
David S. Miller	8decf86879	Merge branch 'master' of github.com:davem330/net Conflicts: MAINTAINERS drivers/net/Kconfig drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c drivers/net/ethernet/broadcom/tg3.c drivers/net/wireless/iwlwifi/iwl-pci.c drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c drivers/net/wireless/rt2x00/rt2800usb.c drivers/net/wireless/wl12xx/main.c	2011-09-22 03:23:13 -04:00
Maciej Żenczykowski	ec0506dbe4	net: relax PKTINFO non local ipv6 udp xmit check Allow transparent sockets to be less restrictive about the source ip of ipv6 udp packets being sent. Google-Bug-Id: 5018138 Signed-off-by: Maciej Żenczykowski <maze@google.com> CC: "Erik Kline" <ek@google.com> CC: "Lorenzo Colitti" <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-30 17:39:01 -04:00
Eric Dumazet	33d480ce6d	net: cleanup some rcu_dereference_raw RCU api had been completed and rcu_access_pointer() or rcu_dereference_protected() are better than generic rcu_dereference_raw() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-12 02:55:28 -07:00
Stephen Hemminger	a9b3cd7f32	rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER When assigning a NULL value to an RCU protected pointer, no barrier is needed. The rcu_assign_pointer, used to handle that but will soon change to not handle the special case. Convert all rcu_assign_pointer of NULL value. //smpl @@ expression P; @@ - rcu_assign_pointer(P, NULL) + RCU_INIT_POINTER(P, NULL) // </smpl> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-02 04:29:23 -07:00
Joe Perches	207ec0abbe	ipv6: Reduce switch/case indent Make the case labels the same indent as the switch. git diff -w shows 80 column reflowing, removal of a useless break after return, and moving open brace after case instead of separate line. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-01 16:11:16 -07:00
Dan Rosenberg	71338aa7d0	net: convert %p usage to %pK The %pK format specifier is designed to hide exposed kernel pointers, specifically via /proc interfaces. Exposing these pointers provides an easy target for kernel write vulnerabilities, since they reveal the locations of writable structures containing easily triggerable function pointers. The behavior of %pK depends on the kptr_restrict sysctl. If kptr_restrict is set to 0, no deviation from the standard %p behavior occurs. If kptr_restrict is set to 1, the default, if the current user (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are printed as 0's. If kptr_restrict is set to 2, kernel pointers using %pK are printed as 0's regardless of privileges. Replacing with 0's was chosen over the default "(null)", which cannot be parsed by userland %p, which expects "(nil)". The supporting code for kptr_restrict and %pK are currently in the -mm tree. This patch converts users of %p in net/ to %pK. Cases of printing pointers to the syslog are not covered, since this would eliminate useful information for postmortem debugging and the reading of the syslog is already optionally protected by the dmesg_restrict sysctl. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: James Morris <jmorris@namei.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Thomas Graf <tgraf@infradead.org> Cc: Eugene Teo <eugeneteo@kernel.org> Cc: Kees Cook <kees.cook@canonical.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-24 01:13:12 -04:00
David S. Miller	bdc712b4c2	inet: Decrease overhead of on-stack inet_cork. When we fast path datagram sends to avoid locking by putting the inet_cork on the stack we use up lots of space that isn't necessary. This is because inet_cork contains a "struct flowi" which isn't used in these code paths. Split inet_cork to two parts, "inet_cork" and "inet_cork_full". Only the latter of which has the "struct flowi" and is what is stored in inet_sock. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>	2011-05-06 15:37:57 -07:00
Eric Dumazet	b71d1d426d	inet: constify ip headers and in6_addr Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-22 11:04:14 -07:00
David S. Miller	1958b856c1	net: Put fl6_* macros to struct flowi6 and use them again. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:55 -08:00
David S. Miller	4c9483b2fb	ipv6: Convert to use flowi6 where applicable. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:54 -08:00
David S. Miller	6281dcc94a	net: Make flowi ports AF dependent. Create two sets of port member accessors, one set prefixed by fl4_* and the other prefixed by fl6_* This will let us to create AF optimal flow instances. It will work because every context in which we access the ports, we have to be fully aware of which AF the flowi is anyways. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:46 -08:00
David S. Miller	1d28f42c1b	net: Put flowi_* prefix on AF independent members of struct flowi I intend to turn struct flowi into a union of AF specific flowi structs. There will be a common structure that each variant includes first, much like struct sock_common. This is the first step to move in that direction. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:44 -08:00
David S. Miller	68d0c6d34d	ipv6: Consolidate route lookup sequences. Route lookups follow a general pattern in the ipv6 code wherein we first find the non-IPSEC route, potentially override the flow destination address due to ipv6 options settings, and then finally make an IPSEC search using either xfrm_lookup() or __xfrm_lookup(). __xfrm_lookup() is used when we want to generate a blackhole route if the key manager needs to resolve the IPSEC rules (in this case -EREMOTE is returned and the original 'dst' is left unchanged). Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC resolution is necessary, we simply fail the lookup completely. All of these cases are encapsulated into two routines, ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow. The latter of which handles unconnected UDP datagram sockets. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-01 13:19:07 -08:00
David S. Miller	bd4a6974cc	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2011-02-04 14:28:58 -08:00
David S. Miller	e2d57766e6	net: Provide compat support for SIOCGETMIFCNT_IN6 and SIOCGETSGCNT_IN6. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-03 18:05:29 -08:00
Eric Dumazet	f2eda47df4	ipv6: raw: rcu annotations Remove sparse warnings, using a function typedef to be able to use __rcu annotation on mh_filter pointer. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-20 16:59:34 -08:00
Eric Dumazet	0d7da9ddd9	net: add __rcu annotation to sk_filter Add __rcu annotation to : (struct sock)->sk_filter And use appropriate rcu primitives to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-25 14:18:28 -07:00
Eric Dumazet	a02cec2155	net: return operator cleanup Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-23 14:33:39 -07:00
Changli Gao	d8d1f30b95	net-next: remove useless union keyword remove useless union keyword in rtable, rt6_info and dn_route. Since there is only one member in a union, the union keyword isn't useful. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-10 23:31:35 -07:00
Eric Dumazet	1789a640f5	raw: avoid two atomics in xmit Avoid two atomic ops per raw_send_hdrinc() call Avoid two atomic ops per raw6_send_hdrinc() call Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-07 01:08:10 -07:00
Arnaud Ebalard	20c59de2e6	ipv6: Refactor update of IPv6 flowi destination address for srcrt (RH) option There are more than a dozen occurrences of following code in the IPv6 stack: if (opt && opt->srcrt) { struct rt0_hdr rt0 = (struct rt0_hdr ) opt->srcrt; ipv6_addr_copy(&final, &fl.fl6_dst); ipv6_addr_copy(&fl.fl6_dst, rt0->addr); final_p = &final; } Replace those with a helper. Note that the helper overrides final_p in all cases. This is ok as final_p was previously initialized to NULL when declared. Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-02 07:08:31 -07:00
Patrick McHardy	1e4b105712	Merge branch 'master' of /repos/git/net-next-2.6 Conflicts: net/bridge/br_device.c net/bridge/br_forward.c Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-05-10 18:39:28 +02:00
Eric Dumazet	f84af32cbc	net: ip_queue_rcv_skb() helper When queueing a skb to socket, we can immediately release its dst if target socket do not use IP_CMSG_PKTINFO. tcp_data_queue() can drop dst too. This to benefit from a hot cache line and avoid the receiver, possibly on another cpu, to dirty this cache line himself. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 15:31:51 -07:00
Brian Haley	4b340ae20d	IPv6: Complete IPV6_DONTFRAG support Finally add support to detect a local IPV6_DONTFRAG event and return the relevant data to the user if they've enabled IPV6_RECVPATHMTU on the socket. The next recvmsg() will return no data, but have an IPV6_PATHMTU as ancillary data. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-23 23:35:29 -07:00
Brian Haley	13b52cd446	IPv6: Add dontfrag argument to relevant functions Add dontfrag argument to relevant functions for IPV6_DONTFRAG support, as well as allowing the value to be passed-in via ancillary cmsg data. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-23 23:35:28 -07:00
Patrick McHardy	6291055465	Merge branch 'master' of /repos/git/net-next-2.6 Conflicts: Documentation/feature-removal-schedule.txt net/ipv6/netfilter/ip6t_REJECT.c net/netfilter/xt_limit.c Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-04-20 16:02:01 +02:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Jan Engelhardt	b2e0b385d7	netfilter: ipv6: use NFPROTO values for NF_HOOK invocation The semantic patch that was used: // <smpl> @@ @@ (NF_HOOK \|NF_HOOK_THRESH \|nf_hook )( -PF_INET6, +NFPROTO_IPV6, ...) // </smpl> Signed-off-by: Jan Engelhardt <jengelh@medozas.de>	2010-03-25 16:00:49 +01:00
Alexey Dobriyan	2c8c1e7297	net: spread __net_init, __net_exit __net_init/__net_exit are apparently not going away, so use them to full extent. In some cases __net_init was removed, because it was called from __net_exit code. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-01-17 19:16:02 -08:00
Eric Dumazet	fd5c002761	ipv6: avoid dev_hold()/dev_put() in rawv6_bind() Using RCU helps not touching device refcount in rawv6_bind() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-08 00:43:18 -08:00
Eric Paris	13f18aa05f	net: drop capability from protocol definitions struct can_proto had a capability field which wasn't ever used. It is dropped entirely. struct inet_protosw had a capability field which can be more clearly expressed in the code by just checking if sock->type = SOCK_RAW. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-05 21:40:17 -08:00
Eric Dumazet	8edf19c2fe	net: sk_drops consolidation part 2 - skb_kill_datagram() can increment sk->sk_drops itself, not callers. - UDP on IPV4 & IPV6 dropped frames (because of bad checksum or policy checks) increment sk_drops Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-18 18:52:54 -07:00
Eric Dumazet	c720c7e838	inet: rename some inet_sock fields In order to have better cache layouts of struct sock (separate zones for rx/tx paths), we need this preliminary patch. Goal is to transfert fields used at lookup time in the first read-mostly cache line (inside struct sock_common) and move sk_refcnt to a separate cache line (only written by rx path) This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr, sport and id fields. This allows a future patch to define these fields as macros, like sk_refcnt, without name clashes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-18 18:52:53 -07:00
Eric Dumazet	766e9037cc	net: sk_drops consolidation sock_queue_rcv_skb() can update sk_drops itself, removing need for callers to take care of it. This is more consistent since sock_queue_rcv_skb() also reads sk_drops when queueing a skb. This adds sk_drops managment to many protocols that not cared yet. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-14 20:40:11 -07:00
Neil Horman	3b885787ea	net: Generalize socket rx gap / receive queue overflow cmsg Create a new socket level option to report number of queue overflows Recently I augmented the AF_PACKET protocol to report the number of frames lost on the socket receive queue between any two enqueued frames. This value was exported via a SOL_PACKET level cmsg. AFter I completed that work it was requested that this feature be generalized so that any datagram oriented socket could make use of this option. As such I've created this patch, It creates a new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue overflowed between any two given frames. It also augments the AF_PACKET protocol to take advantage of this new feature (as it previously did not touch sk->sk_drops, which this patch uses to record the overflow count). Tested successfully by me. Notes: 1) Unlike my previous patch, this patch simply records the sk_drops value, which is not a number of drops between packets, but rather a total number of drops. Deltas must be computed in user space. 2) While this patch currently works with datagram oriented protocols, it will also be accepted by non-datagram oriented protocols. I'm not sure if thats agreeable to everyone, but my argument in favor of doing so is that, for those protocols which aren't applicable to this option, sk_drops will always be zero, and reporting no drops on a receive queue that isn't used for those non-participating protocols seems reasonable to me. This also saves us having to code in a per-protocol opt in mechanism. 3) This applies cleanly to net-next assuming that commit `977750076d` (my af packet cmsg patch) is reverted Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-12 13:26:31 -07:00
David S. Miller	b7058842c9	net: Make setsockopt() optlen be unsigned. This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-30 16:12:20 -07:00
Eric Dumazet	6ce9e7b5fe	ip: Report qdisc packet drops Christoph Lameter pointed out that packet drops at qdisc level where not accounted in SNMP counters. Only if application sets IP_RECVERR, drops are reported to user (-ENOBUFS errors) and SNMP counters updated. IP_RECVERR is used to enable extended reliable error message passing, but these are not needed to update system wide SNMP stats. This patch changes things a bit to allow SNMP counters to be updated, regardless of IP_RECVERR being set or not on the socket. Example after an UDP tx flood # netstat -s ... IP: 1487048 outgoing packets dropped ... Udp: ... SndbufErrors: 1487048 send() syscalls, do however still return an OK status, to not break applications. Note : send() manual page explicitly says for -ENOBUFS error : "The output queue for a network interface was full. This generally indicates that the interface has stopped sending, but may be caused by transient congestion. (Normally, this does not occur in Linux. Packets are just silently dropped when a device queue overflows.) " This is not true for IP_RECVERR enabled sockets : a send() syscall that hit a qdisc drop returns an ENOBUFS error. Many thanks to Christoph, David, and last but not least, Alexey ! Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-02 18:05:33 -07:00
Gerrit Renker	e651f03afe	inet6: Conversion from u8 to int This replaces assignments of the type "int on LHS" = "u8 on RHS" with simpler code. The LHS can express all of the unsigned right hand side values, hence the assigned value can not be negative. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-08-13 16:43:31 -07:00
Brian Haley	d5fdd6babc	ipv6: Use correct data types for ICMPv6 type and code Change all the code that deals directly with ICMPv6 type and code values to use u8 instead of a signed int as that's the actual data type. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-23 04:31:07 -07:00
Eric Dumazet	31e6d363ab	net: correct off-by-one write allocations reports commit `2b85a34e91` (net: No more expensive sock_hold()/sock_put() on each tx) changed initial sk_wmem_alloc value. We need to take into account this offset when reporting sk_wmem_alloc to user, in PROC_FS files or various ioctls (SIOCOUTQ/TIOCOUTQ) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-18 00:29:12 -07:00
Eric Dumazet	adf30907d6	net: skb->dst accessors Define three accessors to get/set dst attached to a skb struct dst_entry skb_dst(const struct sk_buff skb) void skb_dst_set(struct sk_buff skb, struct dst_entry dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-03 02:51:04 -07:00
Neil Horman	edf391ff17	snmp: add missing counters for RFC 4293 The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and OutMcastOctets: http://tools.ietf.org/html/rfc4293 But it seems we don't track those in any way that easy to separate from other protocols. This patch adds those missing counters to the stats file. Tested successfully by me With help from Eric Dumazet. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-27 02:45:02 -07:00
Alexey Dobriyan	52479b623d	netns xfrm: lookup in netns Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns to flow_cache_lookup() and resolver callback. Take it from socket or netdevice. Stub DECnet to init_net. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-25 17:35:18 -08:00
Denis V. Lunev	3bd653c845	netns: add net parameter to IP6_INC_STATS Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-08 10:54:51 -07:00
Yang Hongyang	3cc76caa98	ipv6: When we droped a packet, we should return NET_RX_DROP instead of 0 Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-29 14:27:51 -07:00
Denis V. Lunev	725a8ff04a	ipv6: remove unused parameter from ip6_ra_control Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 00:28:58 -07:00
Eric Dumazet	cb61cb9b8b	udp: sk_drops handling In commits `33c732c361` ([IPV4]: Add raw drops counter) and `a92aa318b4` ([IPV6]: Add raw drops counter), Wang Chen added raw drops counter for /proc/net/raw & /proc/net/raw6 This patch adds this capability to UDP sockets too (/proc/net/udp & /proc/net/udp6). This means that 'RcvbufErrors' errors found in /proc/net/snmp can be also be examined for each udp socket. # grep Udp: /proc/net/snmp Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors Udp: 23971006 75 899420 16390693 146348 0 # cat /proc/net/udp sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt --- uid timeout inode ref pointer drops 75: 00000000:02CB 00000000:0000 07 00000000:00000000 00:00000000 00000000 --- 0 0 2358 2 ffff81082a538c80 0 111: 00000000:006F 00000000:0000 07 00000000:00000000 00:00000000 00000000 --- 0 0 2286 2 ffff81042dd35c80 146348 In this example, only port 111 (0x006F) was flooded by messages that user program could not read fast enough. 146348 messages were lost. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-17 21:04:56 -07:00
Brian Haley	7d06b2e053	net: change proto destroy method to return void Change struct proto destroy function pointer to return void. Noticed by Al Viro. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-14 17:04:49 -07:00
David S. Miller	4ae127d1b6	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/smc911x.c	2008-06-13 20:52:39 -07:00
David S. Miller	f23d60de71	ipv6: Fix duplicate initialization of rawv6_prot.destroy In changeset `22dd485022` ("raw: Raw socket leak.") code was added so that we flush pending frames on raw sockets to avoid leaks. The ipv4 part was fine, but the ipv6 part was not done correctly. Unlike the ipv4 side, the ipv6 code already has a .destroy method for rawv6_prot. So now there were two assignments to this member, and what the compiler does is use the last one, effectively making the ipv6 parts of that changeset a NOP. Fix this by removing the: .destroy = inet6_destroy_sock, line, and adding an inet6_destroy_sock() call to the end of raw6_destroy(). Noticed by Al Viro. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-06-12 16:34:34 -07:00
Adrian Bunk	0b04082995	net: remove CVS keywords This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-11 21:00:38 -07:00
Denis V. Lunev	22dd485022	raw: Raw socket leak. The program below just leaks the raw kernel socket int main() { int fd = socket(PF_INET, SOCK_RAW, IPPROTO_UDP); struct sockaddr_in addr; memset(&addr, 0, sizeof(addr)); inet_aton("127.0.0.1", &addr.sin_addr); addr.sin_family = AF_INET; addr.sin_port = htons(2048); sendto(fd, "a", 1, MSG_MORE, &addr, sizeof(addr)); return 0; } Corked packet is allocated via sock_wmalloc which holds the owner socket, so one should uncork it and flush all pending data on close. Do this in the same way as in UDP. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-04 15:16:12 -07:00
YOSHIFUJI Hideaki	91e1908f56	[IPV6] NETNS: Handle ancillary data in appropriate namespace. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-06-05 04:02:36 +09:00
Johannes Berg	f5184d267c	net: Allow netdevices to specify needed head/tailroom This patch adds needed_headroom/needed_tailroom members to struct net_device and updates many places that allocate sbks to use them. Not all of them can be converted though, and I'm sure I missed some (I mostly grepped for LL_RESERVED_SPACE) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-12 20:48:31 -07:00
YOSHIFUJI Hideaki	1a98d05f59	ipv6 RAW: Disallow IPPROTO_IPV6-level IPV6_CHECKSUM socket option on ICMPv6 sockets. RFC3542 tells that IPV6_CHECKSUM socket option in the IPPROTO_IPV6 level is not allowed on ICMPv6 sockets. IPPROTO_RAW level IPV6_CHECKSUM socket option (a Linux extension) is still allowed. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-24 21:30:38 -07:00
YOSHIFUJI Hideaki	05f175cdcf	[IPV6]: Fix IPV6_RECVERR for connected raw sockets. Based on patch from Dmitry Butskoy <buc@odusz.so-cdu.ru>. Closes: 10437 Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-04-12 13:43:28 +09:00
Brian Haley	876c7f4196	[IPv6]: Change IPv6 unspecified destination address to ::1 for raw and un-connected sockets This patch fixes a difference between IPv4 and IPv6 when sending packets to the unspecified address (either 0.0.0.0 or ::) when using raw or un-connected UDP sockets. There are two cases where IPv6 either fails to send anything, or sends with the destination address set to ::. For example: --> ping -c1 0.0.0.0 PING 0.0.0.0 (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.032 ms --> ping6 -c1 :: PING ::(::) 56 data bytes ping: sendmsg: Invalid argument Doing a sendto("0.0.0.0") reveals: 10:55:01.495090 IP localhost.32780 > localhost.7639: UDP, length 100 Doing a sendto("::") reveals: 10:56:13.262478 IP6 fe80::217:8ff:fe7d:4718.32779 > ::.7639: UDP, length 100 If you issue a connect() first in the UDP case, it will be sent to ::1, similar to what happens with TCP. This restores the BSD-ism. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-04-12 13:43:27 +09:00
YOSHIFUJI Hideaki	7bc570c8b4	[IPV6] MROUTE: Support multicast forwarding. Based on ancient patch by Mickael Hoerdt <hoerdt@clarinet.u-strasbg.fr>, which is available at <http://www-r2.u-strasbg.fr/~hoerdt/dev/linux_ipv6_mforwarding/patch-linux-ipv6-mforwarding-0.1a>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-04-05 22:33:38 +09:00
YOSHIFUJI Hideaki	f0bdb7ba5a	[IPV6] RAW: Remove ancient comment. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-01 23:57:36 -07:00
Pavel Emelyanov	bdcde3d71a	[SOCK]: Drop inuse pcounter from struct proto (v2). An uppercut - do not use the pcounter on struct proto. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 16:39:33 -07:00
YOSHIFUJI Hideaki	878628fbf2	[NET] NETNS: Omit namespace comparision without CONFIG_NET_NS. Introduce an inline net_eq() to compare two namespaces. Without CONFIG_NET_NS, since no namespace other than &init_net exists, it is always 1. We do not need to convert 1) inline vs inline and 2) inline vs &init_net comparisons. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:40:00 +09:00
YOSHIFUJI Hideaki	3b1e0a655f	[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS. Introduce per-sock inlines: sock_net(), sock_net_set() and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:39:55 +09:00
YOSHIFUJI Hideaki	c346dca108	[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS. Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:39:53 +09:00
YOSHIFUJI Hideaki	6b75d09081	[IPV6]: Optimize hop-limit determination. Last part of hop-limit determination is always: hoplimit = dst_metric(dst, RTAX_HOPLIMIT); if (hoplimit < 0) hoplimit = ipv6_get_hoplimit(dst->dev). Let's consolidate it as ip6_dst_hoplimit(dst). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-25 10:24:00 +09:00
Pavel Emelyanov	fc8717baa8	[RAW]: Add raw_hashinfo member on struct proto. Sorry for the patch sequence confusion :\| but I found that the similar thing can be done for raw sockets easily too late. Expand the proto.h union with the raw_hashinfo member and use it in raw_prot and rawv6_prot. This allows to drop the protocol specific versions of hash and unhash callbacks. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 16:56:51 -07:00
Robert P. J. Day	938b93adb2	[NET]: Add debugging names to __RW_LOCK_UNLOCKED macros. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-18 00:59:23 -07:00
Denis V. Lunev	3046d76746	[RAW]: Wrong content of the /proc/net/raw6. The address of IPv6 raw sockets was shown in the wrong format, from IPv4 ones. The problem has been introduced by the commit `42a73808ed` ("[RAW]: Consolidate proc interface.") Thanks to Adrian Bunk who originally noticed the problem. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:26 -08:00
Denis V. Lunev	377cf82d66	[RAW]: Family check in the /proc/net/raw[6] is extra. Different hashtables are used for IPv6 and IPv4 raw sockets, so no need to check the socket family in the iterator over hashtables. Clean this out. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:24 -08:00
Laszlo Attila Toth	4a19ec5800	[NET]: Introducing socket mark socket option. A userspace program may wish to set the mark for each packets its send without using the netfilter MARK target. Changing the mark can be used for mark based routing without netfilter or for packet filtering. It requires CAP_NET_ADMIN capability. Signed-off-by: Laszlo Attila Toth <panther@balabit.hu> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:19 -08:00
Pavel Emelyanov	a308da1627	[NETNS][RAW]: Create the /proc/net/raw(6) in each namespace. To do so, just register the proper subsystem and create files in ->init callbacks. No other special per-namespace handling for raw sockets is required. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:07 -08:00
Pavel Emelyanov	e5ba31f11f	[NETNS][RAW]: Eliminate explicit init_net references. Happily, in all the rest places (->bind callbacks only), that require the struct net, we have a socket, so get the net from it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:06 -08:00
Pavel Emelyanov	f51d599fbe	[NETNS][RAW]: Make /proc/net/raw(6) show per-namespace socket list. Pull the struct net pointer up to the showing functions to filter the sockets depending on their namespaces. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:06 -08:00
Pavel Emelyanov	be185884b3	[NETNS][RAW]: Make ipv[46] raw sockets lookup namespaces aware. This requires just to pass the appropriate struct net pointer into __raw_v[46]_lookup and skip sockets that do not belong to a needed namespace. The proper net is get from skb->dev in all the cases. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:05 -08:00
Daniel Lezcano	bfeade0870	[NETNS][IPV6]: inet6_addr - check ipv6 address per namespace When a new address is added, we must check if the new address does not already exists. This patch makes this check to be aware of a network namespace, so the check will look if the address already exists for the specified network namespace. While the addresses are browsed, the addresses which do not belong to the namespace are discarded. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:44 -08:00
Daniel Lezcano	09f7709f49	[IPV6]: fix section mismatch warnings Removed useless and buggy __exit section in the different ipv6 subsystems. Otherwise they will be called inside an init section during rollbacking in case of an error in the protocol initialization. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:46 -08:00
Herbert Xu	bb72845e69	[IPSEC]: Make callers of xfrm_lookup to use XFRM_LOOKUP_WAIT This patch converts all callers of xfrm_lookup that used an explicit value of 1 to indiciate blocking to use the new flag XFRM_LOOKUP_WAIT. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:42 -08:00
Daniel Lezcano	7f4e4868f3	[IPV6]: make the protocol initialization to return an error code This patchset makes the different protocols to return an error code, so the af_inet6 module can check the initialization was correct or not. The raw6 was taken into account to be consistent with the rest of the protocols, but the registration is at the same place. Because the raw6 has its own init function, the proto and the ops structure can be moved inside the raw6.c file. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:13 -08:00
Pavel Emelyanov	42a73808ed	[RAW]: Consolidate proc interface. Both ipv6/raw.c and ipv4/raw.c use the seq files to walk through the raw sockets hash and show them. The "walking" code is rather huge, but is identical in both cases. The difference is the hash table to walk over and the protocol family to check (this was not in the first virsion of the patch, which was noticed by YOSHIFUJI) Make the ->open store the needed hash table and the family on the allocated raw_iter_state and make the start/next/stop callbacks work with it. This removes most of the code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:32 -08:00
Pavel Emelyanov	ab70768ec7	[RAW]: Consolidate proto->unhash callback Same as the ->hash one, this is easily consolidated. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:31 -08:00
Pavel Emelyanov	65b4c50b47	[RAW]: Consolidate proto->hash callback Having the raw_hashinfo it's easy to consolidate the raw[46]_hash functions. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:31 -08:00
Pavel Emelyanov	b673e4dfc8	[RAW]: Introduce raw_hashinfo structure The ipv4/raw.c and ipv6/raw.c contain many common code (most of which is proc interface) which can be consolidated. Most of the places to consolidate deal with the raw sockets hashtable, so introduce a struct raw_hashinfo which describes the raw sockets hash. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:30 -08:00
Pavel Emelyanov	69d6da0b0f	[IPv6] RAW: Compact the API for the kernel Same as in the previous patch for ipv4, compact the API and hide hash table and rwlock inside the raw.c file. Plus fix some "bad" places from checkpatch.pl point of view (assignments inside if()). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:29 -08:00
Patrick McHardy	6e23ae2a48	[NETFILTER]: Introduce NF_INET_ hook values The IPv4 and IPv6 hook values are identical, yet some code tries to figure out the "correct" value by looking at the address family. Introduce NF_INET_* values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__ section for userspace compatibility. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:55 -08:00
Wang Chen	a92aa318b4	[IPV6]: Add raw6 drops counter. Add raw drops counter for IPv6 in /proc/net/raw6 . Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:34 -08:00
Eric Dumazet	c5a432f1a1	[IPV6]: Use the {DEFINE\|REF}_PROTO_INUSE infrastructure Trivial patch to make "tcpv6,udpv6,udplitev6,rawv6" protocols uses the fast "inuse sockets" infrastructure Each protocol use then a static percpu var, instead of a dynamic one. This saves some ram and some cpu cycles Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:59 -08:00
Pavel Emelyanov	cf7732e4cc	[NET]: Make core networking code use seq_open_private This concerns the ipv4 and ipv6 code mostly, but also the netlink and unix sockets. The netlink code is an example of how to use the __seq_open_private() call - it saves the net namespace on this private. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:55:33 -07:00
Eric W. Biederman	881d966b48	[NET]: Make the device list and device lookups per namespace. This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:10 -07:00
Eric W. Biederman	457c4cbc5a	[NET]: Make /proc/net per network namespace This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:06 -07:00
YOSHIFUJI Hideaki	3ef9d943d2	[IPV6]: Fix unbalanced socket reference with MSG_CONFIRM. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-14 16:45:40 -07:00
Philippe De Muyter	56b3d975bb	[NET]: Make all initialized struct seq_operations const. Make all initialized struct seq_operations in net/ const Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 23:07:31 -07:00
Masahide NAKAMURA	59fbb3a61e	[IPV6] MIP6: Loadable module support for MIPv6. This patch makes MIPv6 loadable module named "mip6". Here is a modprobe.conf(5) example to load it automatically when user application uses XFRM state for MIPv6: alias xfrm-type-10-43 mip6 alias xfrm-type-10-60 mip6 Some MIPv6 feature is not included by this modular, however, it should not be affected to other features like either IPsec or IPv6 with and without the patch. We may discuss XFRM, MH (RAW socket) and ancillary data/sockopt separately for future work. Loadable features: * MH receiving check (to send ICMP error back) * RO header parsing and building (i.e. RH2 and HAO in DSTOPTS) * XFRM policy/state database handling for RO These are NOT covered as loadable: * Home Address flags and its rule on source address selection * XFRM sub policy (depends on its own kernel option) * XFRM functions to receive RO as IPv6 extension header * MH sending/receiving through raw socket if user application opens it (since raw socket allows to do so) * RH2 sending as ancillary data * RH2 operation with setsockopt(2) Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:42 -07:00
Masahide NAKAMURA	136ebf08b4	[IPV6] MIP6: Kill unnecessary ifdefs. Kill unnecessary CONFIG_IPV6_MIP6. o It is redundant for RAW socket to keep MH out with the config then it can handle any protocol. o Clean-up at AH. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:41 -07:00
David S. Miller	14e50e57ae	[XFRM]: Allow packet drops during larval state resolution. The current IPSEC rule resolution behavior we have does not work for a lot of people, even though technically it's an improvement from the -EAGAIN buisness we had before. Right now we'll block until the key manager resolves the route. That works for simple cases, but many folks would rather packets get silently dropped until the key manager resolves the IPSEC rules. We can't tell these folks to "set the socket non-blocking" because they don't have control over the non-block setting of things like the sockets used to resolve DNS deep inside of the resolver libraries in libc. With that in mind I coded up the patch below with some help from Herbert Xu which provides packet-drop behavior during larval state resolution, controllable via sysctl and off by default. This lays the framework to either: 1) Make this default at some point or... 2) Move this logic into xfrm{4,6}_policy.c and implement the ARP-like resolution queue we've all been dreaming of. The idea would be to queue packets to the policy, then once the larval state is resolved by the key manager we re-resolve the route and push the packets out. The packets would timeout if the rule didn't get resolved in a certain amount of time. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-24 18:17:54 -07:00
Stephen Hemminger	3ff50b7997	[NET]: cleanup extra semicolons Spring cleaning time... There seems to be a lot of places in the network code that have extra bogus semicolons after conditionals. Most commonly is a bogus semicolon after: switch() { } Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:24 -07:00
Herbert Xu	604763722c	[NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY When a transmitted packet is looped back directly, CHECKSUM_PARTIAL maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should treat it as such in the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:43 -07:00
Arnaldo Carvalho de Melo	27a884dc3c	[SK_BUFF]: Convert skb->tail to sk_buff_data_t So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes on 64bit architectures, allowing us to combine the 4 bytes hole left by the layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4 64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN... :-) Many calculations that previously required that skb->{transport,network, mac}_header be first converted to a pointer now can be done directly, being meaningful as offsets or pointers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:28 -07:00
Arnaldo Carvalho de Melo	b0e380b1d8	[SK_BUFF]: unions of just one member don't get anything done, kill them Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and skb->mac to skb->mac_header, to match the names of the associated helpers (skb[_[re]set]_{transport,network,mac}_header). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:20 -07:00

1 2 3 4 5

202 Commits