OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Chris Leech	9593782585	[I/OAT]: Add a sysctl for tuning the I/OAT offloaded I/O threshold Any socket recv of less than this ammount will not be offloaded Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-17 21:25:54 -07:00
Chris Leech	624d116473	[I/OAT]: Make sk_eat_skb I/OAT aware. Add an extra argument to sk_eat_skb, and make it move early copied packets to the async_wait_queue instead of freeing them. Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-17 21:25:52 -07:00
Chris Leech	0e4b4992b8	[I/OAT]: Rename cleanup_rbuf to tcp_cleanup_rbuf and make non-static Needed to be able to call tcp_cleanup_rbuf in tcp_input.c for I/OAT Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-17 21:25:50 -07:00
Sean Hefty	a1e8733e55	[NET]: Export ip_dev_find() Export ip_dev_find() to allow locating a net_device given an IP address. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:28 -07:00
Weidong	42d1d52e69	[IPV4]: Increment ipInHdrErrors when TTL expires. Signed-off-by: Weidong <weid@nanjing-fnst.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-12 13:09:59 -07:00
Aki M Nyrhinen	79320d7e14	[TCP]: continued: reno sacked_out count fix From: Aki M Nyrhinen <anyrhine@cs.helsinki.fi> IMHO the current fix to the problem (in_flight underflow in reno) is incorrect. it treats the symptons but ignores the problem. the problem is timing out packets other than the head packet when we don't have sack. i try to explain (sorry if explaining the obvious). with sack, scanning the retransmit queue for timed out packets is fine because we know which packets in our retransmit queue have been acked by the receiver. without sack, we know only how many packets in our retransmit queue the receiver has acknowledged, but no idea which packets. think of a "typical" slow-start overshoot case, where for example every third packet in a window get lost because a router buffer gets full. with sack, we check for timeouts on those every third packet (as the rest have been sacked). the packet counting works out and if there is no reordering, we'll retransmit exactly the packets that were lost. without sack, however, we check for timeout on every packet and end up retransmitting consecutive packets in the retransmit queue. in our slow-start example, 2/3 of those retransmissions are unnecessary. these unnecessary retransmissions eat the congestion window and evetually prevent fast recovery from continuing, if enough packets were lost. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-11 21:18:56 -07:00
Herbert Xu ~{PmVHI~}	f291196979	[TCP]: Avoid skb_pull if possible when trimming head Trimming the head of an skb by calling skb_pull can cause the packet to become unaligned if the length pulled is odd. Since the length is entirely arbitrary for a FIN packet carrying data, this is actually quite common. Unaligned data is not the end of the world, but we should avoid it if it's easily done. In this case it is trivial. Since we're discarding all of the head data it doesn't matter whether we move skb->data forward or back. However, it is still possible to have unaligned skb->data in general. So network drivers should be prepared to handle it instead of crashing. This patch also adds an unlikely marking on len < headlen since partial ACKs on head data are extremely rare in the wild. As the return value of __pskb_trim_head is no longer ever NULL that has been removed. Signed-off-by: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-05 15:03:37 -07:00
Stephen Hemminger	fb80a6e1a5	[TCP] tcp_highspeed: Fix problem observed by Xiaoliang (David) Wei When snd_cwnd is smaller than 38 and the connection is in congestion avoidance phase (snd_cwnd > snd_ssthresh), the snd_cwnd seems to stop growing. The additive increase was confused because C array's are 0 based. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-02 17:51:08 -07:00
Alexey Dobriyan	7114b0bb6d	[NETFILTER]: PPTP helper: fix sstate/cstate typo Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-28 22:51:05 -07:00
Patrick McHardy	ca3ba88d0c	[NETFILTER]: mark H.323 helper experimental Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-28 22:50:40 -07:00
Marcel Holtmann	6c813c3fe9	[NETFILTER]: Fix small information leak in SO_ORIGINAL_DST (CVE-2006-1343) It appears that sockaddr_in.sin_zero is not zeroed during getsockopt(...SO_ORIGINAL_DST...) operation. This can lead to an information leak (CVE-2006-1343). Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-28 22:50:18 -07:00
Chris Wright	4a06373913	[NETFILTER]: SNMP NAT: fix memleak in snmp_object_decode If kmalloc fails, error path leaks data allocated from asn1_oid_decode(). Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-23 15:15:13 -07:00
Patrick McHardy	4d942d8b39	[NETFILTER]: H.323 helper: fix sequence extension parsing When parsing unknown sequence extensions the "son"-pointer points behind the last known extension for this type, don't try to interpret it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-23 15:15:10 -07:00
Patrick McHardy	7185989db4	[NETFILTER]: H.323 helper: fix parser error propagation The condition "> H323_ERROR_STOP" can never be true since H323_ERROR_STOP is positive and is the highest possible return code, while real errors are negative, fix the checks. Also only abort on real errors in some spots that were just interpreting any return value != 0 as error. Fixes crashes caused by use of stale data after a parsing error occured: BUG: unable to handle kernel paging request at virtual address bfffffff printing eip: c01aa0f8 pde = 1a801067 pte = 00000000 Oops: 0000 [#1] PREEMPT Modules linked in: ip_nat_h323 ip_conntrack_h323 nfsd exportfs sch_sfq sch_red cls_fw sch_hfsc xt_length ipt_owner xt_MARK iptable_mangle nfs lockd sunrpc pppoe pppoxx CPU: 0 EIP: 0060:[<c01aa0f8>] Not tainted VLI EFLAGS: 00210646 (2.6.17-rc4 #8) EIP is at memmove+0x19/0x22 eax: d77264e9 ebx: d77264e9 ecx: e88d9b17 edx: d77264e9 esi: bfffffff edi: bfffffff ebp: de6a7680 esp: c0349db8 ds: 007b es: 007b ss: 0068 Process asterisk (pid: 3765, threadinfo=c0349000 task=da068540) Stack: <0>00000006 c0349e5e d77264e3 e09a2b4e e09a38a0 d7726052 d7726124 00000491 00000006 00000006 00000006 00000491 de6a7680 d772601e d7726032 c0349f74 e09a2dc2 00000006 c0349e5e 00000006 00000000 d76dda28 00000491 c0349f74 Call Trace: [<e09a2b4e>] mangle_contents+0x62/0xfe [ip_nat] [<e09a2dc2>] ip_nat_mangle_tcp_packet+0xa1/0x191 [ip_nat] [<e0a2712d>] set_addr+0x74/0x14c [ip_nat_h323] [<e0ad531e>] process_setup+0x11b/0x29e [ip_conntrack_h323] [<e0ad534f>] process_setup+0x14c/0x29e [ip_conntrack_h323] [<e0ad57bd>] process_q931+0x3c/0x142 [ip_conntrack_h323] [<e0ad5dff>] q931_help+0xe0/0x144 [ip_conntrack_h323] ... Found by the PROTOS c07-h2250v4 testsuite. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-23 15:15:08 -07:00
Patrick McHardy	f41d5bb1d9	[NETFILTER]: SNMP NAT: fix memory corruption Fix memory corruption caused by snmp_trap_decode: - When snmp_trap_decode fails before the id and address are allocated, the pointers contain random memory, but are freed by the caller (snmp_parse_mangle). - When snmp_trap_decode fails after allocating just the ID, it tries to free both address and ID, but the address pointer still contains random memory. The caller frees both ID and random memory again. - When snmp_trap_decode fails after allocating both, it frees both, and the callers frees both again. The corruption can be triggered remotely when the ip_nat_snmp_basic module is loaded and traffic on port 161 or 162 is NATed. Found by multiple testcases of the trap-app and trap-enc groups of the PROTOS c06-snmpv1 testsuite. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-22 16:55:14 -07:00
Alexey Dobriyan	4195f81453	[NET]: Fix "ntohl(ntohs" bugs Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-22 16:53:22 -07:00
Solar Designer	2c8ac66bb2	[NETFILTER]: Fix do_add_counters race, possible oops or info leak (CVE-2006-0039) Solar Designer found a race condition in do_add_counters(). The beginning of paddc is supposed to be the same as tmp which was sanity-checked above, but it might not be the same in reality. In case the integer overflow and/or the race condition are triggered, paddc->num_counters might not match the allocation size for paddc. If the check below (t->private->number != paddc->num_counters) nevertheless passes (perhaps this requires the race condition to be triggered), IPT_ENTRY_ITERATE() would read kernel memory beyond the allocation size, potentially causing an oops or leaking sensitive data (e.g., passwords from host system or from another VPS) via counter increments. This requires CAP_NET_ADMIN. Signed-off-by: Solar Designer <solar@openwall.com> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-19 02:16:52 -07:00
Alexey Dobriyan	a467704dcb	[NETFILTER]: GRE conntrack: fix htons/htonl confusion GRE keys are 16 bit. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-19 02:16:29 -07:00
Philip Craig	5c170a09d9	[NETFILTER]: fix format specifier for netfilter log targets The prefix argument for nf_log_packet is a format specifier, so don't pass the user defined string directly to it. Signed-off-by: Philip Craig <philipc@snapgear.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-19 02:15:47 -07:00
Jesper Juhl	493e2428aa	[NETFILTER]: Fix memory leak in ipt_recent The Coverity checker spotted that we may leak 'hold' in net/ipv4/netfilter/ipt_recent.c::checkentry() when the following is true: if (!curr_table->status_proc) { ... if(!curr_table) { ... return 0; <-- here we leak. Simply moving an existing vfree(hold); up a bit avoids the possible leak. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-19 02:15:13 -07:00
Angelo P. Castellani	8872d8e1c4	[TCP]: reno sacked_out count fix From: "Angelo P. Castellani" <angelo.castellani+lkml@gmail.com> Using NewReno, if a sk_buff is timed out and is accounted as lost_out, it should also be removed from the sacked_out. This is necessary because recovery using NewReno fast retransmit could take up to a lot RTTs and the sk_buff RTO can expire without actually being really lost. left_out = sacked_out + lost_out in_flight = packets_out - left_out + retrans_out Using NewReno without this patch, on very large network losses, left_out becames bigger than packets_out + retrans_out (!!). For this reason unsigned integer in_flight overflows to 2^32 - something. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-16 21:42:11 -07:00
Wei Yongjun	63cbd2fda3	[IPV4]: ip_options_fragment() has no effect on fragmentation Fix error point to options in ip_options_fragment(). optptr get a error pointer to the ipv4 header, correct is pointer to ipv4 options. Signed-off-by: Wei Yongjun <weiyj@soft.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-09 15:18:50 -07:00
Hua Zhong	0182bd2b1e	[IPV4]: Remove likely in ip_rcv_finish() This is another result from my likely profiling tool (dwalker@mvista.com just sent the patch of the profiling tool to linux-kernel mailing list, which is similar to what I use). On my system (not very busy, normal development machine within a VMWare workstation), I see a 6/5 miss/hit ratio for this "likely". Signed-off-by: Hua Zhong <hzhong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-06 18:11:39 -07:00
John Heffner	5528e568a7	[TCP]: Fix snd_cwnd adjustments in tcp_highspeed.c Xiaoliang (David) Wei wrote: > Hi gurus, > > I am reading the code of tcp_highspeed.c in the kernel and have a > question on the hstcp_cong_avoid function, specifically the following > AI part (line 136~143 in net/ipv4/tcp_highspeed.c ): > > /* Do additive increase */ > if (tp->snd_cwnd < tp->snd_cwnd_clamp) { > tp->snd_cwnd_cnt += ca->ai; > if (tp->snd_cwnd_cnt >= tp->snd_cwnd) { > tp->snd_cwnd++; > tp->snd_cwnd_cnt -= tp->snd_cwnd; > } > } > > In this part, when (tp->snd_cwnd_cnt == tp->snd_cwnd), > snd_cwnd_cnt will be -1... snd_cwnd_cnt is defined as u16, will this > small chance of getting -1 becomes a problem? > Shall we change it by reversing the order of the cwnd++ and cwnd_cnt -= > cwnd? Absolutely correct. Thanks. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-05 17:41:44 -07:00
Herbert Xu	75c2d9077c	[TCP]: Fix sock_orphan dead lock Calling sock_orphan inside bh_lock_sock in tcp_close can lead to dead locks. For example, the inet_diag code holds sk_callback_lock without disabling BH. If an inbound packet arrives during that admittedly tiny window, it will cause a dead lock on bh_lock_sock. Another possible path would be through sock_wfree if the network device driver frees the tx skb in process context with BH enabled. We can fix this by moving sock_orphan out of bh_lock_sock. The tricky bit is to work out when we need to destroy the socket ourselves and when it has already been destroyed by someone else. By moving sock_orphan before the release_sock we can solve this problem. This is because as long as we own the socket lock its state cannot change. So we simply record the socket state before the release_sock and then check the state again after we regain the socket lock. If the socket state has transitioned to TCP_CLOSE in the time being, we know that the socket has been destroyed. Otherwise the socket is still ours to keep. Note that I've also moved the increment on the orphan count forward. This may look like a problem as we're increasing it even if the socket is just about to be destroyed where it'll be decreased again. However, this simply enlarges a window that already exists. This also changes the orphan count test by one. Considering what the orphan count is meant to do this is no big deal. This problem was discoverd by Ingo Molnar using his lock validator. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-03 23:31:35 -07:00
Patrick McHardy	7800007c1e	[NETFILTER]: x_tables: don't use __copy_{from,to}_user on unchecked memory in compat layer Noticed by Linus Torvalds <torvalds@osdl.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-03 23:20:27 -07:00
Jing Min Zhao	7582e9d17e	[NETFILTER]: H.323 helper: Change author's email address Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-03 23:19:59 -07:00
Patrick McHardy	2354feaeb2	[NETFILTER]: NAT: silence unused variable warnings with CONFIG_XFRM=n net/ipv4/netfilter/ip_nat_standalone.c: In function 'ip_nat_out': net/ipv4/netfilter/ip_nat_standalone.c:223: warning: unused variable 'ctinfo' net/ipv4/netfilter/ip_nat_standalone.c:222: warning: unused variable 'ct' Surprisingly no complaints so far .. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-03 23:19:26 -07:00
Patrick McHardy	4228e2a989	[NETFILTER]: H.323 helper: fix use of uninitialized data When a Choice element contains an unsupported choice no error is returned and parsing continues normally, but the choice value is not set and contains data from the last parsed message. This may in turn lead to parsing of more stale data and following crashes. Fixes a crash triggered by testcase 0003243 from the PROTOS c07-h2250v4 testsuite following random other testcases: CPU: 0 EIP: 0060:[<c01a9554>] Not tainted VLI EFLAGS: 00210646 (2.6.17-rc2 #3) EIP is at memmove+0x19/0x22 eax: d7be0307 ebx: d7be0307 ecx: e841fcf9 edx: d7be0307 esi: bfffffff edi: bfffffff ebp: da5eb980 esp: c0347e2c ds: 007b es: 007b ss: 0068 Process events/0 (pid: 4, threadinfo=c0347000 task=dff86a90) Stack: <0>00000006 c0347ea6 d7be0301 e09a6b2c 00000006 da5eb980 d7be003e d7be0052 c0347f6c e09a6d9c 00000006 c0347ea6 00000006 00000000 d7b9a548 00000000 c0347f6c d7b9a548 00000004 e0a1a119 0000028f 00000006 c0347ea6 00000006 Call Trace: [<e09a6b2c>] mangle_contents+0x40/0xd8 [ip_nat] [<e09a6d9c>] ip_nat_mangle_tcp_packet+0xa1/0x191 [ip_nat] [<e0a1a119>] set_addr+0x60/0x14d [ip_nat_h323] [<e0ab6e66>] q931_help+0x2da/0x71a [ip_conntrack_h323] [<e0ab6e98>] q931_help+0x30c/0x71a [ip_conntrack_h323] [<e09af242>] ip_conntrack_help+0x22/0x2f [ip_conntrack] [<c022934a>] nf_iterate+0x2e/0x5f [<c025d357>] xfrm4_output_finish+0x0/0x39f [<c02294ce>] nf_hook_slow+0x42/0xb0 [<c025d357>] xfrm4_output_finish+0x0/0x39f [<c025d732>] xfrm4_output+0x3c/0x4e [<c025d357>] xfrm4_output_finish+0x0/0x39f [<c0230370>] ip_forward+0x1c2/0x1fa [<c022f417>] ip_rcv+0x388/0x3b5 [<c02188f9>] netif_receive_skb+0x2bc/0x2ec [<c0218994>] process_backlog+0x6b/0xd0 [<c021675a>] net_rx_action+0x4b/0xb7 [<c0115606>] __do_softirq+0x35/0x7d [<c0104294>] do_softirq+0x38/0x3f Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-03 23:17:11 -07:00
Patrick McHardy	6fd737031e	[NETFILTER]: H.323 helper: fix endless loop caused by invalid TPKT len When the TPKT len included in the packet is below the lowest valid value of 4 an underflow occurs which results in an endless loop. Found by testcase 0000058 from the PROTOS c07-h2250v4 testsuite. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-03 23:16:29 -07:00
Patrick McHardy	e17df688f7	[NETFILTER] SCTP conntrack: fix infinite loop fix infinite loop in the SCTP-netfilter code: check SCTP chunk size to guarantee progress of for_each_sctp_chunk(). (all other uses of for_each_sctp_chunk() are preceded by do_basic_checks(), so this fix should be complete.) Based on patch from Ingo Molnar <mingo@elte.hu> CVE-2006-1527 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-02 17:26:39 -07:00
Patrick McHardy	46c5ea3c9a	[NETFILTER] x_tables: fix compat related crash on non-x86 When iptables userspace adds an ipt_standard_target, it calculates the size of the entire entry as: sizeof(struct ipt_entry) + XT_ALIGN(sizeof(struct ipt_standard_target)) ipt_standard_target looks like this: struct xt_standard_target { struct xt_entry_target target; int verdict; }; xt_entry_target contains a pointer, so when compiled for 64 bit the structure gets an extra 4 byte of padding at the end. On 32 bit architectures where iptables aligns to 8 byte it will also have 4 byte padding at the end because it is only 36 bytes large. The compat_ipt_standard_fn in the kernel adjusts the offsets by sizeof(struct ipt_standard_target) - sizeof(struct compat_ipt_standard_target), which will always result in 4, even if the structure from userspace was already padded to a multiple of 8. On x86 this works out by accident because userspace only aligns to 4, on all other architectures this is broken and causes incorrect adjustments to the size and following offsets. Thanks to Linus for lots of debugging help and testing. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-01 20:48:32 -07:00
Hua Zhong	83de47cd0c	[TCP]: Fix unlikely usage in tcp_transmit_skb() The following unlikely should be replaced by likely because the condition happens every time unless there is a hard error to transmit a packet. Signed-off-by: Hua Zhong <hzhong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-29 18:33:19 -07:00
Herbert Xu	a76e07acd0	[IPSEC]: Fix IP ID selection I was looking through the xfrm input/output code in order to abstract out the address family specific encapsulation/decapsulation code. During that process I found this bug in the IP ID selection code in xfrm4_output.c. At that point dst is still the xfrm_dst for the current SA which represents an internal flow as far as the IPsec tunnel is concerned. Since the IP ID is going to sit on the outside of the encapsulated packet, we obviously want the external flow which is just dst->child. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-29 18:33:16 -07:00
Heiko Carstens	a536e07787	[IPV4]: inet_init() -> fs_initcall Convert inet_init to an fs_initcall to make sure its called before any device driver's initcall. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-29 18:33:14 -07:00
Thomas Voegtle	44adf28f4a	[NETFILTER]: ULOG target is not obsolete The backend part is obsoleted, but the target itself is still needed. Signed-off-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-24 17:27:29 -07:00
Herbert Xu	b60b49ea6a	[TCP]: Account skb overhead in tcp_fragment Make sure that we get the full sizeof(struct sk_buff) plus the data size accounted for in skb->truesize. This will create invariants that will allow adding assertion checks on skb->truesize. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-19 21:35:00 -07:00
Jesper Juhl	63903ca6af	[NET]: Remove redundant NULL checks before [kv]free Redundant NULL check before kfree removal from net/ Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-18 15:57:55 -07:00
Herbert Xu	ef5cb9738b	[TCP]: Fix truesize underflow There is a problem with the TSO packet trimming code. The cause of this lies in the tcp_fragment() function. When we allocate a fragment for a completely non-linear packet the truesize is calculated for a payload length of zero. This means that truesize could in fact be less than the real payload length. When that happens the TSO packet trimming can cause truesize to become negative. This in turn can cause sk_forward_alloc to be -n * PAGE_SIZE which would trigger the warning. I've copied the code DaveM used in tso_fragment which should work here. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-18 15:57:49 -07:00
Stephen Hemminger	d2c962b853	[IPV4]: ip_route_input panic fix This fixes http://bugzilla.kernel.org/show_bug.cgi?id=6388 The bug is caused by ip_route_input dereferencing skb->nh.protocol of the dummy skb passed dow from inet_rtm_getroute (Thanks Thomas for seeing it). It only happens if the route requested is for a multicast IP address. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-17 17:27:11 -07:00
Zach Brown	3d9dd7564d	[PATCH] ip_output: account for fraggap when checking to add trailer_len During other work I noticed that ip_append_data() seemed to be forgetting to include the frag gap in its calculation of a fragment that consumes the rest of the payload. Herbert confirmed that this was a bug that snuck in during a previous rework. Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-14 16:04:18 -07:00
Adrian Bunk	6c97e72a16	[IPV4]: Possible cleanups. This patch contains the following possible cleanups: - make the following needlessly global function static: - arp.c: arp_rcv() - remove the following unused EXPORT_SYMBOL's: - devinet.c: devinet_ioctl - fib_frontend.c: ip_rt_ioctl - inet_hashtables.c: inet_bind_bucket_create - inet_hashtables.c: inet_bind_hash - tcp_input.c: sysctl_tcp_abc - tcp_ipv4.c: sysctl_tcp_tw_reuse - tcp_output.c: sysctl_tcp_mtu_probing - tcp_output.c: sysctl_tcp_base_mss Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-14 15:00:20 -07:00
KAMEZAWA Hiroyuki	6f91204225	[PATCH] for_each_possible_cpu: network codes for_each_cpu() actually iterates across all possible CPUs. We've had mistakes in the past where people were using for_each_cpu() where they should have been iterating across only online or present CPUs. This is inefficient and possibly buggy. We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the future. This patch replaces for_each_cpu with for_each_possible_cpu under /net Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-04-11 06:18:31 -07:00
David S. Miller	55c0022e53	[IPV4] ip_fragment: Always compute hash with ipfrag_lock held. Otherwise we could compute an inaccurate hash due to the random seed changing. Noticed by Zach Brown and patch is based upon some feedback from Herbert Xu. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:43:55 -07:00
Patrick McHardy	19910d1aec	[NETFILTER]: Fix DNAT in LOCAL_OUT Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:38:29 -07:00
Patrick McHardy	7a43c99551	[NETFILTER]: H.323 helper: remove changelog Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:43 -07:00
Patrick McHardy	96f6bf82ea	[NETFILTER]: Convert conntrack/ipt_REJECT to new checksumming functions Besides removing lots of duplicate code, all converted users benefit from improved HW checksum error handling. Tested with and without HW checksums in almost all combinations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:42 -07:00
Patrick McHardy	422c346fad	[NETFILTER]: Add address family specific checksum helpers Add checksum operation which takes care of verifying the checksum and dealing with HW checksum errors and avoids multiple checksum operations by setting ip_summed to CHECKSUM_UNNECESSARY after successful verification. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:41 -07:00
Patrick McHardy	bce8032ef3	[NETFILTER]: Introduce infrastructure for address family specific operations Change the queue rerouter intrastructure to a generic usable infrastructure for address family specific operations as a base for some cleanups. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:40 -07:00
Patrick McHardy	a0aed49bdb	[NETFILTER]: Fix IP_NF_CONNTRACK_NETLINK dependency When NAT is built as a module, ip_conntrack_netlink can not be linked statically. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:39 -07:00
Jing Min Zhao	a0b7db5e86	[NETFILTER]: H.323 helper: add parameter 'default_rrq_ttl' default_rrq_ttl is used when no TTL is included in the RRQ. Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:38 -07:00
Jing Min Zhao	51d42f5e4e	[NETFILTER]: H.323 helper: make get_h245_addr() static Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:37 -07:00
Jing Min Zhao	0f249685fd	[NETFILTER]: H.323 helper: change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:36 -07:00
Jing Min Zhao	48bfee5fad	[NETFILTER]: H.323 helper: move some function prototypes to ip_conntrack_h323.h Move prototypes of NAT callbacks to ip_conntrack_h323.h. Because the use of typedefs as arguments, some header files need to be moved as well. Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:35 -07:00
Patrick McHardy	32292a7ff1	[NETFILTER]: Fix section mismatch warnings Fix section mismatch warnings caused by netfilter's init_or_cleanup functions used in many places by splitting the init from the cleanup parts. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:34 -07:00
Patrick McHardy	964ddaa10d	[NETFILTER]: Clean up hook registration Clean up hook registration by makeing use of the new mass registration and unregistration helpers. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:33 -07:00
Herbert Xu	45af08be6d	[INET]: Use port unreachable instead of proto for tunnels This patch changes GRE and SIT to generate port unreachable instead of protocol unreachable errors when we can't find a matching tunnel for a packet. This removes the ambiguity as to whether the error is caused by no tunnel being found or by the lack of support for the given tunnel type. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:29 -07:00
Herbert Xu	50fba2aa7c	[INET]: Move no-tunnel ICMP error to tunnel4/tunnel6 This patch moves the sending of ICMP messages when there are no IPv4/IPv6 tunnels present to tunnel4/tunnel6 respectively. Please note that for now if xfrm4_tunnel/xfrm6_tunnel is loaded then no ICMP messages will ever be sent. This is similar to how we handle AH/ESP/IPCOMP. This move fixes the bug where we always send an ICMP message when there is no ip6_tunnel device present for a given packet even if it is later handled by IPsec. It also causes ICMP messages to be sent when no IPIP tunnel is present. I've decided to use the "port unreachable" ICMP message over the current value of "address unreachable" (and "protocol unreachable" by GRE) because it is not ambiguous unlike the other ones which can be triggered by other conditions. There seems to be no standard specifying what value must be used so this change should be OK. In fact we should change GRE to use this value as well. Incidentally, this patch also fixes a fairly serious bug in xfrm6_tunnel where we don't check whether the embedded IPv6 header is present before dereferencing it for the inside source address. This patch is inspired by a previous patch by Hugo Santos <hsantos@av.it.pt>. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:25 -07:00
Patrick McHardy	2e2f7aefa8	[NETFILTER]: Fix fragmentation issues with bridge netfilter The conntrack code doesn't do re-fragmentation of defragmented packets anymore but relies on fragmentation in the IP layer. Purely bridged packets don't pass through the IP layer, so the bridge netfilter code needs to take care of fragmentation itself. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:23 -07:00
Robert Olsson	550e29bc96	[FIB_TRIE]: Fix leaf freeing. Seems like leaf (end-nodes) has been freed by __tnode_free_rcu and not by __leaf_free_rcu. This fixes the problem. Only tnode_free is now used which checks for appropriate node type. free_leaf can be removed. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:23 -07:00
Herbert Xu	8bf4b8a108	[IPSEC]: Check x->encap before dereferencing it We need to dereference x->encap before dereferencing it for encap_type. If it's absent then the encap_type is zero. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:22 -07:00
Dmitry Mishin	2722971cbe	[NETFILTER]: iptables 32bit compat layer This patch extends current iptables compatibility layer in order to get 32bit iptables to work on 64bit kernel. Current layer is insufficient due to alignment checks both in kernel and user space tools. Patch is for current net-2.6.17 with addition of move of ipt_entry_{match\| target} definitions to xt_entry_{match\|target}. Signed-off-by: Dmitry Mishin <dim@openvz.org> Acked-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-01 02:25:19 -08:00
Martin Josefsson	e64a70be51	[NETFILTER]: {ip,nf}_conntrack_netlink: fix expectation notifier unregistration This patch fixes expectation notifier unregistration on module unload to use ip_conntrack_expect_unregister_notifier(). This bug causes a soft lockup at the first expectation created after a rmmod ; insmod of this module. Should go into -stable as well. Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-01 02:24:48 -08:00
Yasuyuki Kozakai	a89ecb6a2e	[NETFILTER]: x_tables: unify IPv4/IPv6 multiport match This unifies ipt_multiport and ip6t_multiport to xt_multiport. As a result, this addes support for inversion and port range match to IPv6 packets. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-01 02:22:54 -08:00
Yasuyuki Kozakai	dc5ab2faec	[NETFILTER]: x_tables: unify IPv4/IPv6 esp match This unifies ipt_esp and ip6t_esp to xt_esp. Please note that now a user program needs to specify IPPROTO_ESP as protocol to use esp match with IPv6. This means that ip6tables requires '-p esp' like iptables. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-01 02:22:30 -08:00
Herbert Xu	dbe5b4aaaf	[IPSEC]: Kill unused decap state structure This patch removes the *_decap_state structures which were previously used to share state between input/post_input. This is no longer needed. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-01 00:54:16 -08:00
Herbert Xu	e695633e21	[IPSEC]: Kill unused decap state argument This patch removes the decap_state argument from the xfrm input hook. Previously this function allowed the input hook to share state with the post_input hook. The latter has since been removed. The only purpose for it now is to check the encap type. However, it is easier and better to move the encap type check to the generic xfrm_rcv function. This allows us to get rid of the decap state argument altogether. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-01 00:52:46 -08:00
Andrew Morton	65b4b4e81a	[NETFILTER]: Rename init functions. Every netfilter module uses `init' for its module_init() function and `fini' or `cleanup' for its module_exit() function. Problem is, this creates uninformative initcall_debug output and makes ctags rather useless. So go through and rename them all to $(filename)_init and $(filename)_fini. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-28 17:02:48 -08:00
S P	c3e5d877aa	[TCP]: Fix RFC2465 typo. Signed-off-by: S P <speattle@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-28 17:02:47 -08:00
Herbert Xu	d2acc3479c	[INET]: Introduce tunnel4/tunnel6 Basically this patch moves the generic tunnel protocol stuff out of xfrm4_tunnel/xfrm6_tunnel and moves it into the new files of tunnel4.c and tunnel6 respectively. The reason for this is that the problem that Hugo uncovered is only the tip of the iceberg. The real problem is that when we removed the dependency of ipip on xfrm4_tunnel we didn't really consider the module case at all. For instance, as it is it's possible to build both ipip and xfrm4_tunnel as modules and if the latter is loaded then ipip simply won't load. After considering the alternatives I've decided that the best way out of this is to restore the dependency of ipip on the non-xfrm-specific part of xfrm4_tunnel. This is acceptable IMHO because the intention of the removal was really to be able to use ipip without the xfrm subsystem. This is still preserved by this patch. So now both ipip/xfrm4_tunnel depend on the new tunnel4.c which handles the arbitration between the two. The order of processing is determined by a simple integer which ensures that ipip gets processed before xfrm4_tunnel. The situation for ICMP handling is a little bit more complicated since we may not have enough information to determine who it's for. It's not a big deal at the moment since the xfrm ICMP handlers are basically no-ops. In future we can deal with this when we look at ICMP caching in general. The user-visible change to this is the removal of the TUNNEL Kconfig prompts. This makes sense because it can only be used through IPCOMP as it stands. The addition of the new modules shouldn't introduce any problems since module dependency will cause them to be loaded. Oh and I also turned some unnecessary pskb's in IPv6 related to this patch to skb's. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-28 17:02:46 -08:00
Alan Stern	e041c68341	[PATCH] Notifier chain update: API changes The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:50 -08:00
Ingo Molnar	14cc3e2b63	[PATCH] sem2mutex: misc static one-file mutexes Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jens Axboe <axboe@suse.de> Cc: Neil Brown <neilb@cse.unsw.edu.au> Acked-by: Alasdair G Kergon <agk@redhat.com> Cc: Greg KH <greg@kroah.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Adam Belay <ambx1@neo.rr.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:56:55 -08:00
Linus Torvalds	b55813a2e5	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NETFILTER] x_table.c: sem2mutex [IPV4]: Aggregate route entries with different TOS values [TCP]: Mark tcp_*mem[] __read_mostly. [TCP]: Set default max buffers from memory pool size [SCTP]: Fix up sctp_rcv return value [NET]: Take RTNL when unregistering notifier [WIRELESS]: Fix config dependencies. [NET]: Fill in a 32-bit hole in struct sock on 64-bit platforms. [NET]: Ensure device name passed to SO_BINDTODEVICE is NULL terminated. [MODULES]: Don't allow statically declared exports [BRIDGE]: Unaligned accesses in the ethernet bridge	2006-03-25 08:39:20 -08:00
Davide Libenzi	f348d70a32	[PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications Implement the half-closed devices notifiation, by adding a new POLLRDHUP (and its alias EPOLLRDHUP) bit to the existing poll/select sets. Since the existing POLLHUP handling, that does not report correctly half-closed devices, was feared to be changed, this implementation leaves the current POLLHUP reporting unchanged and simply add a new bit that is set in the few places where it makes sense. The same thing was discussed and conceptually agreed quite some time ago: http://lkml.org/lkml/2003/7/12/116 Since this new event bit is added to the existing Linux poll infrastruture, even the existing poll/select system calls will be able to use it. As far as the existing POLLHUP handling, the patch leaves it as is. The pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing archs and sets the bit in the six relevant files. The other attached diff is the simple change required to sys/epoll.h to add the EPOLLRDHUP definition. There is "a stupid program" to test POLLRDHUP delivery here: http://www.xmailserver.org/pollrdhup-test.c It tests poll(2), but since the delivery is same epoll(2) will work equally. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-25 08:22:56 -08:00
Ilia Sotnikov	cef2685e00	[IPV4]: Aggregate route entries with different TOS values When we get an ICMP need-to-frag message, the original TOS value in the ICMP payload cannot be used as a key to look up the routes to update. This is because the TOS field may have been modified by routers on the way. Similarly, ip_rt_redirect should also ignore the TOS as the router that gave us the message may have modified the TOS value. The patch achieves this objective by aggregating entries with different TOS values (but are otherwise identical) into the same bucket. This makes it easy to update them at the same time when an ICMP message is received. In future we should use a twin-hashing scheme where teh aggregation occurs at the entry level. That is, the TOS goes back into the hash for normal lookups while ICMP lookups will end up with a node that gives us a list that contains all other route entries that differ only by TOS. Signed-off-by: Ilia Sotnikov <hostcc@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-25 01:38:55 -08:00
David S. Miller	b8059eadf9	[TCP]: Mark tcp_*mem[] __read_mostly. Suggested by Stephen Hemminger. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-25 01:36:56 -08:00
John Heffner	7b4f4b5ebc	[TCP]: Set default max buffers from memory pool size This patch sets the maximum TCP buffer sizes (available to automatic buffer tuning, not to setsockopt) based on the TCP memory pool size. The maximum sndbuf and rcvbuf each will be up to 4 MB, but no more than 1/128 of the memory pressure threshold. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-25 01:34:07 -08:00
Alexey Dobriyan	53b3531bbb	[PATCH] s/;;/;/g Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-24 07:33:24 -08:00
Patrick McHardy	a5cdc03003	[IPV4]: Add fib rule netlink notifications To really make sense of route notifications in the presence of multiple tables, userspace also needs to be notified about routing rule updates. Notifications are sent to the so far unused RTNLGRP_NOP1 (now RTNLGRP_RULE) group. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-23 01:16:06 -08:00
Alexey Kuznetsov	1a55d57b10	[TCP]: Do not use inet->id of global tcp_socket when sending RST. The problem is in ip_push_pending_frames(), which uses: if (!df) { __ip_select_ident(iph, &rt->u.dst, 0); } else { iph->id = htons(inet->id++); } instead of ip_select_ident(). Right now I think the code is a nonsense. Most likely, I copied it from old ip_build_xmit(), where it was really special, we had to decide whether to generate unique ID when generating the first (well, the last) fragment. In ip_push_pending_frames() it does not make sense, it should use plain ip_select_ident() instead. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-22 14:27:59 -08:00
Patrick McHardy	6a534ee35c	[NETFILTER]: Fix undefined references to get_h225_addr get_h225_addr is exported, but declared static, which fails when linking statically. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-22 13:57:25 -08:00
Pablo Neira Ayuso	b9f78f9fca	[NETFILTER]: nf_conntrack: support for layer 3 protocol load on demand x_tables matches and targets that require nf_conntrack_ipv[4\|6] to work don't have enough information to load on demand these modules. This patch introduces the following changes to solve this issue: o nf_ct_l3proto_try_module_get: try to load the layer 3 connection tracker module and increases the refcount. o nf_ct_l3proto_module put: drop the refcount of the module. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-22 13:56:08 -08:00
Pablo Neira Ayuso	a45049c51c	[NETFILTER]: x_tables: set the protocol family in x_tables targets/matches Set the family field in xt_[matches\|targets] registered. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-22 13:55:40 -08:00
Pablo Neira Ayuso	4e3882f773	[NETFILTER]: conntrack: cleanup the conntrack ID initialization Currently the first conntrack ID assigned is 2, use 1 instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-22 13:55:11 -08:00
Pablo Neira Ayuso	1cde64365b	[NETFILTER]: ctnetlink: Fix expectaction mask dumping The expectation mask has some particularities that requires a different handling. The protocol number fields can be set to non-valid protocols, ie. l3num is set to 0xFFFF. Since that protocol does not exist, the mask tuple will not be dumped. Moreover, this results in a kernel panic when nf_conntrack accesses the array of protocol handlers, that is PF_MAX (0x1F) long. This patch introduces the function ctnetlink_exp_dump_mask, that correctly dumps the expectation mask. Such function uses the l3num value from the expectation tuple that is a valid layer 3 protocol number. The value of the l3num mask isn't dumped since it is meaningless from the userspace side. Thanks to Yasuyuki Kozakai and Patrick McHardy for the feedback. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-22 13:54:15 -08:00
Jing Min Zhao	5e35941d99	[NETFILTER]: Add H.323 conntrack/NAT helper Signed-off-by: Jing Min Zhao <zhaojignmin@hotmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 23:41:17 -08:00
David S. Miller	dbeff12b4d	[INET]: Fix typo in Arnaldo's connection sock compat fixups. "struct inet_csk" --> "struct inet_connection_sock" :-) Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:52:32 -08:00
Arnaldo Carvalho de Melo	543d9cfeec	[NET]: Identation & other cleanups related to compat_[gs]etsockopt cset No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs to just after the function exported, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:48:35 -08:00
Arnaldo Carvalho de Melo	dec73ff029	[ICSK] compat: Introduce inet_csk_compat_[gs]etsockopt Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:46:16 -08:00
Dmitry Mishin	3fdadf7d27	[NET]: {get\|set}sockopt compatibility layer This patch extends {get\|set}sockopt compatibility layer in order to move protocol specific parts to their place and avoid huge universal net/compat.c file in the future. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:45:21 -08:00
Catherine Zhang	2c7946a7bf	[SECURITY]: TCP/UDP getpeersec This patch implements an application of the LSM-IPSec networking controls whereby an application can determine the label of the security association its TCP or UDP sockets are currently connected to via getsockopt and the auxiliary data mechanism of recvmsg. Patch purpose: This patch enables a security-aware application to retrieve the security context of an IPSec security association a particular TCP or UDP socket is using. The application can then use this security context to determine the security context for processing on behalf of the peer at the other end of this connection. In the case of UDP, the security context is for each individual packet. An example application is the inetd daemon, which could be modified to start daemons running at security contexts dependent on the remote client. Patch design approach: - Design for TCP The patch enables the SELinux LSM to set the peer security context for a socket based on the security context of the IPSec security association. The application may retrieve this context using getsockopt. When called, the kernel determines if the socket is a connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry cache on the socket to retrieve the security associations. If a security association has a security context, the context string is returned, as for UNIX domain sockets. - Design for UDP Unlike TCP, UDP is connectionless. This requires a somewhat different API to retrieve the peer security context. With TCP, the peer security context stays the same throughout the connection, thus it can be retrieved at any time between when the connection is established and when it is torn down. With UDP, each read/write can have different peer and thus the security context might change every time. As a result the security context retrieval must be done TOGETHER with the packet retrieval. The solution is to build upon the existing Unix domain socket API for retrieving user credentials. Linux offers the API for obtaining user credentials via ancillary messages (i.e., out of band/control messages that are bundled together with a normal message). Patch implementation details: - Implementation for TCP The security context can be retrieved by applications using getsockopt with the existing SO_PEERSEC flag. As an example (ignoring error checking): getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, &optlen); printf("Socket peer context is: %s\n", optbuf); The SELinux function, selinux_socket_getpeersec, is extended to check for labeled security associations for connected (TCP_ESTABLISHED == sk->sk_state) TCP sockets only. If so, the socket has a dst_cache of struct dst_entry values that may refer to security associations. If these have security associations with security contexts, the security context is returned. getsockopt returns a buffer that contains a security context string or the buffer is unmodified. - Implementation for UDP To retrieve the security context, the application first indicates to the kernel such desire by setting the IP_PASSSEC option via getsockopt. Then the application retrieves the security context using the auxiliary data mechanism. An example server application for UDP should look like this: toggle = 1; toggle_len = sizeof(toggle); setsockopt(sockfd, SOL_IP, IP_PASSSEC, &toggle, &toggle_len); recvmsg(sockfd, &msg_hdr, 0); if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) { cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr); if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) && cmsg_hdr->cmsg_level == SOL_IP && cmsg_hdr->cmsg_type == SCM_SECURITY) { memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext)); } } ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow a server socket to receive security context of the peer. A new ancillary message type SCM_SECURITY. When the packet is received we get the security context from the sec_path pointer which is contained in the sk_buff, and copy it to the ancillary message space. An additional LSM hook, selinux_socket_getpeersec_udp, is defined to retrieve the security context from the SELinux space. The existing function, selinux_socket_getpeersec does not suit our purpose, because the security context is copied directly to user space, rather than to kernel space. Testing: We have tested the patch by setting up TCP and UDP connections between applications on two machines using the IPSec policies that result in labeled security associations being built. For TCP, we can then extract the peer security context using getsockopt on either end. For UDP, the receiving end can retrieve the security context using the auxiliary data mechanism of recvmsg. Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:41:23 -08:00
Rick Jones	15d99e02ba	[TCP]: sysctl to allow TCP window > 32767 sans wscale Back in the dark ages, we had to be conservative and only allow 15-bit window fields if the window scale option was not negotiated. Some ancient stacks used a signed 16-bit quantity for the window field of the TCP header and would get confused. Those days are long gone, so we can use the full 16-bits by default now. There is a sysctl added so that we can still interact with such old stacks Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:40:29 -08:00
Neil Horman	abd596a4b6	[IPV4] ARP: Alloc acceptance of unsolicited ARP via netdevice sysctl. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:39:47 -08:00
David S. Miller	edb2c34fb2	[NETFILTER]: Fix warnings in ip_nat_snmp_basic.c net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'asn1_header_decode': net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'len' may be used uninitialized in this function net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'def' may be used uninitialized in this function net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'snmp_translate': net/ipv4/netfilter/ip_nat_snmp_basic.c:672: warning: 'l' may be used uninitialized in this function net/ipv4/netfilter/ip_nat_snmp_basic.c:668: warning: 'type' may be used uninitialized in this function Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:36:21 -08:00
Ingo Molnar	57b47a53ec	[NET]: sem2mutex part 2 Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:35:41 -08:00
Arjan van de Ven	4a3e2f711a	[NET] sem2mutex: net/ Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:33:17 -08:00
Stephen Hemminger	1533306186	[NET]: dev_put/dev_hold cleanup Get rid of the old __dev_put macro that is just a hold over from pre 2.6 kernel. And turn dev_hold into an inline instead of a macro. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:32:28 -08:00
Stephen Hemminger	6756ae4b4e	[NET]: Convert RTNL to mutex. This patch turns the RTNL from a semaphore to a new 2.6.16 mutex and gets rid of some of the leftover legacy. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:23:58 -08:00
Baruch Even	50bf3e224a	[TCP] H-TCP: Better time accounting Instead of estimating the time since the last congestion event, count it directly. Signed-off-by: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:23:10 -08:00
Baruch Even	0bc6d90b82	[TCP] H-TCP: Account for delayed-ACKs Account for delayed-ACKs in H-TCP. Delayed-ACKs cause H-TCP to be less aggressive than its design calls for. It is especially true when the receiver is a Linux machine where the average delayed ack is over 3 packets with values of 7 not unheard of. Signed-off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:22:47 -08:00

1 2 3 4 5 ...

729 Commits