OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Ilpo Järvinen	52cd5750e8	tcp: fix length used for checksum in a reset While looking for some common code I came across difference in checksum calculation between tcp_v6_send_(reset\|ack) I couldn't explain. I checked both v4 and v6 and found out that both seem to have the same "feature". I couldn't find anything in rfc nor anywhere else which would state that md5 option should be ignored like it was in case of reset so I came to a conclusion that this is probably a genuine bug. I suspect that addition of md5 just was fooled by the excessive copy-paste code in those functions and the reset part was never tested well enough to find out the problem. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-08 11:34:06 -07:00
David S. Miller	364ae953a4	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	2008-10-08 09:50:38 -07:00
Jan Engelhardt	916a917dfe	netfilter: xtables: provide invoked family value to extensions By passing in the family through which extensions were invoked, a bit of data space can be reclaimed. The "family" member will be added to the parameter structures and the check functions be adjusted. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:20 +02:00
Jan Engelhardt	a2df1648ba	netfilter: xtables: move extension arguments into compound structure (6/6) This patch does this for target extensions' destroy functions. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:19 +02:00
Jan Engelhardt	af5d6dc200	netfilter: xtables: move extension arguments into compound structure (5/6) This patch does this for target extensions' checkentry functions. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:19 +02:00
Jan Engelhardt	7eb3558655	netfilter: xtables: move extension arguments into compound structure (4/6) This patch does this for target extensions' target functions. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:19 +02:00
Jan Engelhardt	6be3d8598e	netfilter: xtables: move extension arguments into compound structure (3/6) This patch does this for match extensions' destroy functions. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:19 +02:00
Jan Engelhardt	9b4fce7a35	netfilter: xtables: move extension arguments into compound structure (2/6) This patch does this for match extensions' checkentry functions. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:18 +02:00
Jan Engelhardt	f7108a20de	netfilter: xtables: move extension arguments into compound structure (1/6) The function signatures for Xtables extensions have grown over time. It involves a lot of typing/replication, and also a bit of stack space even if they are not used. Realize an NFWS2008 idea and pack them into structs. The skb remains outside of the struct so gcc can continue to apply its optimizations. This patch does this for match extensions' match functions. A few ambiguities have also been addressed. The "offset" parameter for example has been renamed to "fragoff" (there are so many different offsets already) and "protoff" to "thoff" (there is more than just one protocol here, so clarify). Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:18 +02:00
Jan Engelhardt	c2df73de24	netfilter: xtables: use "if" blocks in Kconfig Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:18 +02:00
Jan Engelhardt	aba0d34800	netfilter: xtables: sort extensions alphabetically in Kconfig Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:17 +02:00
Jan Engelhardt	367c679007	netfilter: xtables: do centralized checkentry call (1/2) It used to be that {ip,ip6,etc}_tables called extension->checkentry themselves, but this can be moved into the xtables core. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:17 +02:00
KOVACS Krisztian	73e4022f78	netfilter: split netfilter IPv4 defragmentation into a separate module Netfilter connection tracking requires all IPv4 packets to be defragmented. Both the socket match and the TPROXY target depend on this functionality, so this patch separates the Netfilter IPv4 defrag hooks into a separate module. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:12 +02:00
Alexey Dobriyan	cfd6e3d747	netfilter: netns nat: PPTP NAT in netns Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:11 +02:00
Alexey Dobriyan	9174c1538f	netfilter: netns nf_conntrack: fixup DNAT in netns Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:11 +02:00
Alexey Dobriyan	0c4c9288ad	netfilter: netns nat: per-netns bysource hash Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:11 +02:00
Alexey Dobriyan	e099a17357	netfilter: netns nat: per-netns NAT table Same story as with iptable_filter, iptables_raw tables. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:10 +02:00
Alexey Dobriyan	b8b8063e0d	netfilter: netns nat: fix ipt_MASQUERADE in netns First, allow entry in notifier hook. Second, start conntrack cleanup in netns to which netdevice belongs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:10 +02:00
Alexey Dobriyan	c2a2c7e0cc	netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_log_invalid sysctl Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:08 +02:00
Alexey Dobriyan	c04d05529a	netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_checksum sysctl Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:08 +02:00
Alexey Dobriyan	8e9df80180	netfilter: netns nf_conntrack: per-netns /proc/net/stat/nf_conntrack, /proc/net/stat/ip_conntrack Show correct conntrack count, while I'm at it. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:08 +02:00
Alexey Dobriyan	0d55af8791	netfilter: netns nf_conntrack: per-netns statistics Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:07 +02:00
Alexey Dobriyan	a71996fccc	netfilter: netns nf_conntrack: pass conntrack to nf_conntrack_event_cache() not skb This is cleaner, we already know conntrack to which event is relevant. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:07 +02:00
Alexey Dobriyan	5e6b29972b	netfilter: netns nf_conntrack: per-netns /proc/net/ip_conntrack, /proc/net/stat/ip_conntrack, /proc/net/ip_conntrack_expect Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:06 +02:00
Alexey Dobriyan	74c51a1497	netfilter: netns nf_conntrack: pass netns pointer to L4 protocol's ->error hook Again, it's deducible from skb, but we're going to use it for nf_conntrack_checksum and statistics, so just pass it from upper layer. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:05 +02:00
Alexey Dobriyan	a702a65fc1	netfilter: netns nf_conntrack: pass netns pointer to nf_conntrack_in() It's deducible from skb->dev or skb->dst->dev, but we know netns at the moment of call, so pass it down and use for finding and creating conntracks. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:04 +02:00
Alexey Dobriyan	9b03f38d04	netfilter: netns nf_conntrack: per-netns expectations Make per-netns a) expectation hash and b) expectations count. Expectations always belongs to netns to which it's master conntrack belong. This is natural and doesn't bloat expectation. Proc files and leaf users are stubbed to init_net, this is temporary. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:03 +02:00
Alexey Dobriyan	b21f890193	netfilter: netns: fix {ip,6}_route_me_harder() in netns Take netns from skb->dst->dev. It should be safe because, they are called from LOCAL_OUT hook where dst is valid (though, I'm not exactly sure about IPVS and queueing packets to userspace). [Patrick: its safe everywhere since they already expect skb->dst to be set] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:03 +02:00
Alexey Dobriyan	400dad39d1	netfilter: netns nf_conntrack: per-netns conntrack hash * make per-netns conntrack hash Other solution is to add ->ct_net pointer to tuplehashes and still has one hash, I tried that it's ugly and requires more code deep down in protocol modules et al. * propagate netns pointer to where needed, e. g. to conntrack iterators. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:03 +02:00
Alexey Dobriyan	49ac8713b6	netfilter: netns nf_conntrack: per-netns conntrack count Sysctls and proc files are stubbed to init_net's one. This is temporary. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:03 +02:00
Alexey Dobriyan	48dc7865aa	netfilter: netns: remove nf_*_net() wrappers Now that dev_net() exists, the usefullness of them is even less. Also they're a big problem in resolving circular header dependencies necessary for NOTRACK-in-netns patch. See below. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:01 +02:00
Jan Engelhardt	ee999d8b95	netfilter: x_tables: use NFPROTO_* in extensions Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:01 +02:00
Jan Engelhardt	e948b20a71	netfilter: rename ipt_recent to xt_recent Like with other modules (such as ipt_state), ipt_recent.h is changed to forward definitions to (IOW include) xt_recent.h, and xt_recent.c is changed to use the new constant names. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:00 +02:00
Jan Engelhardt	76108cea06	netfilter: Use unsigned types for hooknum and pf vars and (try to) consistently use u_int8_t for the L3 family. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:00 +02:00
Daniele Lacamera	9d2c27e17b	tcp: Fix tcp_hybla zero congestion window growth with small rho and large cwnd. Because of rounding, in certain conditions, i.e. when in congestion avoidance state rho is smaller than 1/128 of the current cwnd, TCP Hybla congestion control starves and the cwnd is kept constant forever. This patch forces an increment by one segment after #send_cwnd calls without increments(newreno behavior). Signed-off-by: Daniele Lacamera <root@danielinux.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-07 15:58:17 -07:00
Rami Rosen	b8bae41ed6	ipv4: add mc_count to in_device. This patch add mc_count to struct in_device and updates increment/decrement/initilaize of this field in IPv4 and in IPv6. - Also printing the vfs /proc entry (/proc/net/igmp) is adjusted to use the new mc_count. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-07 15:34:37 -07:00
Ali Saidi	53240c2087	tcp: Fix possible double-ack w/ user dma From: Ali Saidi <saidi@engin.umich.edu> When TCP receive copy offload is enabled it's possible that tcp_rcv_established() will cause two acks to be sent for a single packet. In the case that a tcp_dma_early_copy() is successful, copied_early is set to true which causes tcp_cleanup_rbuf() to be called early which can send an ack. Further along in tcp_rcv_established(), __tcp_ack_snd_check() is called and will schedule a delayed ACK. If no packets are processed before the delayed ack timer expires the packet will be acked twice. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-07 15:31:19 -07:00
Denis V. Lunev	0c7ed677fb	netns: make udpv6 mib per/namespace Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-07 14:49:36 -07:00
Ilpo Järvinen	4a7e56098f	tcp: cleanup messy initializer I'm quite sure that if I give this function in its old format for you to inspect, you start to wonder what is the type of demanded or if it's a global variable. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-07 14:43:31 -07:00
Ilpo Järvinen	33f5f57eeb	tcp: kill pointless urg_mode It all started from me noticing that this urgent check in tcp_clean_rtx_queue is unnecessarily inside the loop. Then I took a longer look to it and found out that the users of urg_mode can trivially do without, well almost, there was one gotcha. Bonus: those funny people who use urg with >= 2^31 write_seq - snd_una could now rejoice too (that's the only purpose for the between being there, otherwise a simple compare would have done the thing). Not that I assume that the rest of the tcp code happily lives with such mind-boggling numbers :-). Alas, it turned out to be impossible to set wmem to such numbers anyway, yes I really tried a big sendfile after setting some wmem but nothing happened :-). ...Tcp_wmem is int and so is sk_sndbuf... So I hacked a bit variable to long and found out that it seems to work... :-) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-07 14:43:06 -07:00
Peter Zijlstra	c57943a1c9	net: wrap sk->sk_backlog_rcv() Wrap calling sk->sk_backlog_rcv() in a function. This will allow extending the generic sk_backlog_rcv behaviour. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-07 14:18:42 -07:00
KOVACS Krisztian	23542618de	inet: Don't lookup the socket if there's a socket attached to the skb Use the socket cached in the skb if it's present. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-07 12:41:01 -07:00
KOVACS Krisztian	607c4aaf03	inet: Add udplib_lookup_skb() helpers To be able to use the cached socket reference in the skb during input processing we add a new set of lookup functions that receive the skb on their argument list. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-07 12:38:32 -07:00
Arnaldo Carvalho de Melo	9a1f27c480	inet_hashtables: Add inet_lookup_skb helpers To be able to use the cached socket reference in the skb during input processing we add a new set of lookup functions that receive the skb on their argument list. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-07 11:41:57 -07:00
Simon Horman	a5e8546a8b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 into lvs-next-2.6	2008-10-07 08:40:11 +11:00
Julius Volz	cb7f6a7b71	IPVS: Move IPVS to net/netfilter/ipvs Since IPVS now has partial IPv6 support, this patch moves IPVS from net/ipv4/ipvs to net/netfilter/ipvs. It's a result of: $ git mv net/ipv4/ipvs net/netfilter and adapting the relevant Kconfigs/Makefiles to the new path. Signed-off-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-10-07 08:38:24 +11:00
David S. Miller	c7004482e8	tcp: Respect SO_RCVLOWAT in tcp_poll(). Based upon a report by Vito Caputo. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-06 10:43:54 -07:00
KOVACS Krisztian	bcd41303f4	udp: Export UDP socket lookup function The iptables tproxy code has to be able to do UDP socket hash lookups, so we have to provide an exported lookup function for this purpose. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-01 07:48:10 -07:00
KOVACS Krisztian	a3116ac5c2	tcp: Port redirection support for TCP Current TCP code relies on the local port of the listening socket being the same as the destination address of the incoming connection. Port redirection used by many transparent proxying techniques obviously breaks this, so we have to store the original destination port address. This patch extends struct inet_request_sock and stores the incoming destination port value there. It also modifies the handshake code to use that value as the source port when sending reply packets. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-01 07:46:49 -07:00
KOVACS Krisztian	86b08d867d	ipv4: Make Netfilter's ip_route_me_harder() non-local address compatible Netfilter's ip_route_me_harder() tries to re-route packets either generated or re-routed by Netfilter. This patch changes ip_route_me_harder() to handle packets from non-locally-bound sockets with IP_TRANSPARENT set as local and to set the appropriate flowi flags when re-doing the routing lookup. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-01 07:44:42 -07:00
KOVACS Krisztian	88ef4a5a78	tcp: Handle TCP SYN+ACK/ACK/RST transparency The TCP stack sends out SYN+ACK/ACK/RST reply packets in response to incoming packets. The non-local source address check on output bites us again, as replies for transparently redirected traffic won't have a chance to leave the node. This patch selectively sets the FLOWI_FLAG_ANYSRC flag when doing the route lookup for those replies. Transparent replies are enabled if the listening socket has the transparent socket flag set. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-01 07:41:00 -07:00
KOVACS Krisztian	1668e010cb	ipv4: Make inet_sock.h independent of route.h inet_iif() in inet_sock.h requires route.h. Since users of inet_iif() usually require other route.h functionality anyway this patch moves inet_iif() to route.h. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-01 07:33:10 -07:00
Tóth László Attila	b9fb15067c	ipv4: Allow binding to non-local addresses if IP_TRANSPARENT is set Setting IP_TRANSPARENT is not really useful without allowing non-local binds for the socket. To make user-space code simpler we allow these binds even if IP_TRANSPARENT is set but IP_FREEBIND is not. Signed-off-by: Tóth László Attila <panther@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-01 07:31:24 -07:00
KOVACS Krisztian	f5715aea45	ipv4: Implement IP_TRANSPARENT socket option This patch introduces the IP_TRANSPARENT socket option: enabling that will make the IPv4 routing omit the non-local source address check on output. Setting IP_TRANSPARENT requires NET_ADMIN capability. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-01 07:30:02 -07:00
Julian Anastasov	a210d01ae3	ipv4: Loosen source address check on IPv4 output ip_route_output() contains a check to make sure that no flows with non-local source IP addresses are routed. This obviously makes using such addresses impossible. This patch introduces a flowi flag which makes omitting this check possible. The new flag provides a way of handling transparent and non-transparent connections differently. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-01 07:28:28 -07:00
David S. Miller	b262e60309	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath9k/core.c drivers/net/wireless/ath9k/main.c net/core/dev.c	2008-10-01 06:12:56 -07:00
Vitaliy Gusev	4dd7972d12	tcp: Fix NULL dereference in tcp_4_send_ack() Fix NULL dereference in tcp_4_send_ack(). As skb->dev is reset to NULL in tcp_v4_rcv() thus OOPS occurs: BUG: unable to handle kernel NULL pointer dereference at 00000000000004d0 IP: [<ffffffff80498503>] tcp_v4_send_ack+0x203/0x250 Stack: ffff810005dbb000 ffff810015c8acc0 e77b2c6e5f861600 a01610802e90cb6d 0a08010100000000 88afffff88afffff 0000000080762be8 0000000115c872e8 0004122000000000 0000000000000001 ffffffff80762b88 0000000000000020 Call Trace: <IRQ> [<ffffffff80499c33>] tcp_v4_reqsk_send_ack+0x20/0x22 [<ffffffff8049bce5>] tcp_check_req+0x108/0x14c [<ffffffff8047aaf7>] ? rt_intern_hash+0x322/0x33c [<ffffffff80499846>] tcp_v4_do_rcv+0x399/0x4ec [<ffffffff8045ce4b>] ? skb_checksum+0x4f/0x272 [<ffffffff80485b74>] ? __inet_lookup_listener+0x14a/0x15c [<ffffffff8049babc>] tcp_v4_rcv+0x6a1/0x701 [<ffffffff8047e739>] ip_local_deliver_finish+0x157/0x24a [<ffffffff8047ec9a>] ip_local_deliver+0x72/0x7c [<ffffffff8047e5bd>] ip_rcv_finish+0x38d/0x3b2 [<ffffffff803d3548>] ? scsi_io_completion+0x19d/0x39e [<ffffffff8047ebe5>] ip_rcv+0x2a2/0x2e5 [<ffffffff80462faa>] netif_receive_skb+0x293/0x303 [<ffffffff80465a9b>] process_backlog+0x80/0xd0 [<ffffffff802630b4>] ? __rcu_process_callbacks+0x125/0x1b4 [<ffffffff8046560e>] net_rx_action+0xb9/0x17f [<ffffffff80234cc5>] __do_softirq+0xa3/0x164 [<ffffffff8020c52c>] call_softirq+0x1c/0x28 <EOI> [<ffffffff8020de1c>] do_softirq+0x34/0x72 [<ffffffff80234b8e>] local_bh_enable_ip+0x3f/0x50 [<ffffffff804d43ca>] _spin_unlock_bh+0x12/0x14 [<ffffffff804599cd>] release_sock+0xb8/0xc1 [<ffffffff804a6f9a>] inet_stream_connect+0x146/0x25c [<ffffffff80243078>] ? autoremove_wake_function+0x0/0x38 [<ffffffff8045751f>] sys_connect+0x68/0x8e [<ffffffff80291818>] ? fd_install+0x5f/0x68 [<ffffffff80457784>] ? sock_map_fd+0x55/0x62 [<ffffffff8020b39b>] system_call_after_swapgs+0x7b/0x80 Code: 41 10 11 d0 83 d0 00 4d 85 ed 89 45 c0 c7 45 c4 08 00 00 00 74 07 41 8b 45 04 89 45 c8 48 8b 43 20 8b 4d b8 48 8d 55 b0 48 89 de <48> 8b 80 d0 04 00 00 48 8b b8 60 01 00 00 e8 20 ae fe ff 65 48 RIP [<ffffffff80498503>] tcp_v4_send_ack+0x203/0x250 RSP <ffffffff80762b78> CR2: 00000000000004d0 Signed-off-by: Vitaliy Gusev <vgusev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-01 01:51:39 -07:00
David S. Miller	28e3487b7d	tcp: Fix queue traversal in tcp_use_frto(). We must check tcp_skb_is_last() before doing a tcp_write_queue_next(). Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-23 02:51:41 -07:00
David S. Miller	77d40a0952	tcp: Fix order of tests in tcp_retransmit_skb() tcp_write_queue_next() must only be made if we know that tcp_skb_is_last() evaluates to false. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-23 01:29:23 -07:00
David S. Miller	43f59c8939	net: Remove __skb_insert() calls outside of skbuff internals. This minor cleanup simplifies later changes which will convert struct sk_buff and friends over to using struct list_head. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-21 21:28:51 -07:00
Sven Wegener	8d5803bf6f	ipvs: Fix unused label warning Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-09-22 09:57:26 +10:00
Sven Wegener	e6f225ebb7	ipvs: Restrict sync message to 255 connections The nr_conns variable in the sync message header is only eight bits wide and will overflow on interfaces with a large MTU. As a result the backup won't parse all connections contained in the sync buffer. On regular ethernet with an MTU of 1500 this isn't a problem, because we can't overflow the value, but consider jumbo frames being used on a cross-over connection between both directors. We now restrict the size of the sync buffer, so that we never put more than 255 connections into a single sync buffer. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-09-22 09:55:58 +10:00
Tom Quetchenbach	f5fff5dc8a	tcp: advertise MSS requested by user I'm trying to use the TCP_MAXSEG option to setsockopt() to set the MSS for both sides of a bidirectional connection. man tcp says: "If this option is set before connection establishment, it also changes the MSS value announced to the other end in the initial packet." However, the kernel only uses the MTU/route cache to set the advertised MSS. That means if I set the MSS to, say, 500 before calling connect(), I will send at most 500-byte packets, but I will still receive 1500-byte packets in reply. This is a bug, either in the kernel or the documentation. This patch (applies to latest net-2.6) reduces the advertised value to that requested by the user as long as setsockopt() is called before connect() or accept(). This seems like the behavior that one would expect as well as that which is documented. I've tried to make sure that things that depend on the advertised MSS are set correctly. Signed-off-by: Tom Quetchenbach <virtualphtn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-21 00:21:51 -07:00
Arnaldo Carvalho de Melo	6067804047	net: Use hton[sl]() instead of __constant_hton[sl]() where applicable Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 22:20:49 -07:00
Ilpo Järvinen	618d9f2554	tcp: back retransmit_high when it over-estimated If lost skb is sacked, we might have nothing to retransmit as high as the retransmit_high is pointing to, so place it lower to avoid unnecessary walking. This is mainly for the case where high L'ed skbs gets sacked. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:26:22 -07:00
Ilpo Järvinen	90638a04ad	tcp: don't clear lost_skb_hint when not necessary Most importantly avoid doing it with cumulative ACK. However, since we have lost_cnt_hint in the picture as well needing adjustments, it's not as trivial as dealing with retransmit_skb_hint (and cannot be done in the all place we could trivially leave retransmit_skb_hint untouched). With the previous patch, this should mostly remove O(n^2) behavior while cumulative ACKs start flowing once rexmit after a lossy round-trip made it through. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:25:52 -07:00
Ilpo Järvinen	ef9da47c7c	tcp: don't clear retransmit_skb_hint when not necessary Most importantly avoid doing it with cumulative ACK. Not clearing means that we no longer need n^2 processing in resolution of each fast recovery. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:25:15 -07:00
Ilpo Järvinen	f0ceb0ed86	tcp: remove retransmit_skb_hint clearing from failure This doesn't much sense here afaict, probably never has. Since fragmenting and collapsing deal the hints by themselves, there should be very little reason for the rexmit loop to do that. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:24:49 -07:00
Ilpo Järvinen	0e1c54c2a4	tcp: reorganize retransmit code loops Both loops are quite similar, so they can be combined with little effort. As a result, forward_skb_hint becomes obsolete as well. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:24:21 -07:00
Ilpo Järvinen	08ebd1721a	tcp: remove tp->lost_out guard to make joining diff nicer The validity of the retransmit_high must then be ensured if no L'ed skb exits! This makes a minor change to behavior, we now have to iterate the head to find out that the loop terminates. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:23:49 -07:00
Ilpo Järvinen	61eb55f4db	tcp: Reorganize skb tagbit checks Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:22:59 -07:00
Ilpo Järvinen	34638570b5	tcp: remove obsolete validity concern Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:22:17 -07:00
Ilpo Järvinen	b5afe7bc71	tcp: add tcp_can_forward_retransmit Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:21:54 -07:00
Ilpo Järvinen	184d68b2b0	tcp: No need to clear retransmit_skb_hint when SACKing Because lost counter no longer requires tuning, this is trivial to remove (the tuning wouldn't have been too hard either) because no "new" retransmittable skb appeared below retransmit_skb_hint when SACKing for sure. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:21:16 -07:00
Ilpo Järvinen	f09142eddb	tcp: Kill precaution that's very likely obsolete I suspect it might have been related to the changed amount of lost skbs, which was counted by retransmit_cnt_hint that got changed. The place for this clearing was very illogical anyway, it should have been after the LOST-bit clearing loop to make any sense. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:20:50 -07:00
Ilpo Järvinen	006f582c73	tcp: convert retransmit_cnt_hint to seqno Main benefit in this is that we can then freely point the retransmit_skb_hint to anywhere we want to because there's no longer need to know what would be the count changes involve, and since this is really used only as a terminator, unnecessary work is one time walk at most, and if some retransmissions are necessary after that point later on, the walk is not full waste of time anyway. Since retransmit_high must be kept valid, all lost markers must ensure that. Now I also have learned how those "holes" in the rexmittable skbs can appear, mtu probe does them. So I removed the misleading comment as well. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:20:20 -07:00
Ilpo Järvinen	41ea36e35a	tcp: add helper for lost bit toggling This useful because we'd need to verifying soon in many places which makes things slightly more complex than it used to be. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:19:22 -07:00
Ilpo Järvinen	c8c213f20c	tcp: move tcp_verify_retransmit_hint Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:18:55 -07:00
Ilpo Järvinen	64edc2736e	tcp: Partial hint clearing has again become meaningless Ie., the difference between partial and all clearing doesn't exists anymore since the SACK optimizations got dropped by an sacktag rewrite. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 21:18:32 -07:00
Brian Haley	d286600e19	ipvs: change some __constant_htons() to htons() Change __contant_htons() to htons() in the IPVS code when not in an initializer. -Brian Signed-off-by: Brian Haley <brian.haley@hp.com> Acked-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-09-17 10:13:17 +10:00
Simon Horman	563e94f072	ipvs: add __aquire/__release annotations to ip_vs_info_seq_start/ip_vs_info_seq_stop This teaches sparse that the following are not problems: make C=1 CHECK net/ipv4/ipvs/ip_vs_ctl.c net/ipv4/ipvs/ip_vs_ctl.c:1793:14: warning: context imbalance in 'ip_vs_info_seq_start' - wrong count at exit net/ipv4/ipvs/ip_vs_ctl.c:1842:13: warning: context imbalance in 'ip_vs_info_seq_stop' - unexpected unlock Acked-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-09-17 10:10:42 +10:00
Simon Horman	dff630ddad	ipvs: supply a valid 0 address to ip_vs_conn_new() ip_vs_conn_new expects a union nf_inet_addr as the type for its address parameters, not a plain integer. This problem was detected by sparse. make C=1 CHECK net/ipv4/ipvs/ip_vs_core.c net/ipv4/ipvs/ip_vs_core.c:469:9: warning: Using plain integer as NULL pointer Acked-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-09-17 10:10:42 +10:00
Simon Horman	9e691ed68d	ipvs: only unlock in ip_vs_edit_service() if already locked Jumping to out unlocks __ip_vs_svc_lock, but that lock is not taken until after code that may jump to out. This problem was detected by sparse. make C=1 CHECK net/ipv4/ipvs/ip_vs_ctl.c net/ipv4/ipvs/ip_vs_ctl.c:1332:2: warning: context imbalance in 'ip_vs_edit_service' - unexpected unlock Acked-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-09-17 10:10:41 +10:00
Herbert Xu	93821778de	udp: Fix rcv socket locking The previous patch in response to the recursive locking on IPsec reception is broken as it tries to drop the BH socket lock while in user context. This patch fixes it by shrinking the section protected by the socket lock to sock_queue_rcv_skb only. The only reason we added the lock is for the accounting which happens in that function. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-15 11:48:46 -07:00
Stephen Rothwell	63f2c04648	net: ip_vs_proto_{tcp,udp} build fix Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-12 23:23:50 -07:00
Simon Horman	c051a0a2c9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 into lvs-next-2.6	2008-09-10 09:14:52 +10:00
Gerrit Renker	410e27a49b	This reverts "Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/dccp_exp" as it accentally contained the wrong set of patches. These will be submitted separately. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-09 13:27:22 +02:00
David S. Miller	0a68a20cc3	Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/dccp_exp Conflicts: net/dccp/input.c net/dccp/options.c	2008-09-08 17:28:59 -07:00
David S. Miller	17dce5dfe3	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: net/mac80211/mlme.c	2008-09-08 16:59:05 -07:00
Sven Wegener	e9c0ce232e	ipvs: Embed user stats structure into kernel stats structure Instead of duplicating the fields, integrate a user stats structure into the kernel stats structure. This is more robust when the members are changed, because they are now automatically kept in sync. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Reviewed-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-09-09 09:53:08 +10:00
Sven Wegener	2206a3f5b7	ipvs: Restrict connection table size via Kconfig Instead of checking the value in include/net/ip_vs.h, we can just restrict the range in our Kconfig file. This will prevent values outside of the range early. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Reviewed-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-09-09 09:50:55 +10:00
Julius Volz	9d7f2a2b1a	IPVS: Remove incorrect ip_route_me_harder(), fix IPv6 Remove an incorrect ip_route_me_harder() that was probably a result of merging my IPv6 patches with the local client patches. With this, IPv6+NAT are working again. Signed-off-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-09-09 09:43:13 +10:00
Simon Horman	503e81f65a	ipvs: handle PARTIAL_CHECKSUM Now that LVS can load balance locally generated traffic, packets may come from the loopback device and thus may have a partial checksum. The existing code allows for the case where there is no checksum at all for TCP, however Herbert Xu has confirmed that this is not legal. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julius Volz <juliusv@google.com>	2008-09-09 09:36:32 +10:00
Daniel Lezcano	d315492b1a	netns : fix kernel panic in timewait socket destruction How to reproduce ? - create a network namespace - use tcp protocol and get timewait socket - exit the network namespace - after a moment (when the timewait socket is destroyed), the kernel panics. # BUG: unable to handle kernel NULL pointer dereference at 0000000000000007 IP: [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8 PGD 119985067 PUD 11c5c0067 PMD 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: ipv6 button battery ac loop dm_mod tg3 libphy ext3 jbd edd fan thermal processor thermal_sys sg sata_svw libata dock serverworks sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table] Pid: 0, comm: swapper Not tainted 2.6.27-rc2 #3 RIP: 0010:[<ffffffff821e394d>] [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8 RSP: 0018:ffff88011ff7fed0 EFLAGS: 00010246 RAX: ffffffffffffffff RBX: ffffffff82339420 RCX: ffff88011ff7ff30 RDX: 0000000000000001 RSI: ffff88011a4d03c0 RDI: ffff88011ac2fc00 RBP: ffffffff823392e0 R08: 0000000000000000 R09: ffff88002802a200 R10: ffff8800a5c4b000 R11: ffffffff823e4080 R12: ffff88011ac2fc00 R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 FS: 0000000041cbd940(0000) GS:ffff8800bff839c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000007 CR3: 00000000bd87c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff8800bff9e000, task ffff88011ff76690) Stack: ffffffff823392e0 0000000000000100 ffffffff821e3a3a 0000000000000008 0000000000000000 ffffffff821e3a61 ffff8800bff7c000 ffffffff8203c7e7 ffff88011ff7ff10 ffff88011ff7ff10 0000000000000021 ffffffff82351108 Call Trace: <IRQ> [<ffffffff821e3a3a>] ? inet_twdr_hangman+0x0/0x9e [<ffffffff821e3a61>] ? inet_twdr_hangman+0x27/0x9e [<ffffffff8203c7e7>] ? run_timer_softirq+0x12c/0x193 [<ffffffff820390d1>] ? __do_softirq+0x5e/0xcd [<ffffffff8200d08c>] ? call_softirq+0x1c/0x28 [<ffffffff8200e611>] ? do_softirq+0x2c/0x68 [<ffffffff8201a055>] ? smp_apic_timer_interrupt+0x8e/0xa9 [<ffffffff8200cad6>] ? apic_timer_interrupt+0x66/0x70 <EOI> [<ffffffff82011f4c>] ? default_idle+0x27/0x3b [<ffffffff8200abbd>] ? cpu_idle+0x5f/0x7d Code: e8 01 00 00 4c 89 e7 41 ff c5 e8 8d fd ff ff 49 8b 44 24 38 4c 89 e7 65 8b 14 25 24 00 00 00 89 d2 48 8b 80 e8 00 00 00 48 f7 d0 <48> 8b 04 d0 48 ff 40 58 e8 fc fc ff ff 48 89 df e8 c0 5f 04 00 RIP [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8 RSP <ffff88011ff7fed0> CR2: 0000000000000007 This patch provides a function to purge all timewait sockets related to a network namespace. The timewait sockets life cycle is not tied with the network namespace, that means the timewait sockets stay alive while the network namespace dies. The timewait sockets are for avoiding to receive a duplicate packet from the network, if the network namespace is freed, the network stack is removed, so no chance to receive any packets from the outside world. Furthermore, having a pending destruction timer on these sockets with a network namespace freed is not safe and will lead to an oops if the timer callback which try to access data belonging to the namespace like for example in: inet_twdr_do_twkill_work -> NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED); Purging the timewait sockets at the network namespace destruction will: 1) speed up memory freeing for the namespace 2) fix kernel panic on asynchronous timewait destruction Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Denis V. Lunev <den@openvz.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-08 13:17:27 -07:00
Simon Horman	178f5e494e	IPVS: use ipv6_addr_copy() It is standard to use ipv6_addr_copy() to fill in the in6 element of a union nf_inet_addr snet. Thanks to Julius Volz for pointing this out. Cc: Brian Haley <brian.haley@hp.com> Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julius Volz <juliusv@google.com>	2008-09-08 09:34:46 +10:00
Simon Horman	5af149cc34	IPVS: fix bogus indentation Sorry, this was my error. Thanks to Julius Volz for pointing it out. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julius Volz <juliusv@google.com>	2008-09-08 09:34:45 +10:00
Sven Wegener	3bfb92f407	ipvs: Reject ipv6 link-local addresses for destinations We can't use non-local link-local addresses for destinations, without knowing the interface on which we can reach the address. Reject them for now. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-09-08 09:34:45 +10:00
Sven Wegener	77eb851630	ipvs: Mark tcp/udp v4 and v6 debug functions static They are only used in this file, so they should be static Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-09-08 09:34:44 +10:00
Sven Wegener	a5ba4bf273	ipvs: Return negative error values from ip_vs_edit_service() Like the other code in this function does. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-09-08 09:34:44 +10:00
Sven Wegener	cd9fe6c4f0	ipvs: Use pointer to address from sync message We want a pointer to it, not the value casted to a pointer. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-09-08 09:34:43 +10:00

1 2 3 4 5 ...

2888 Commits