linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Chuck Lever	922004120b	SUNRPC: transport switch API for setting port number At some point, transport endpoint addresses will no longer be IPv4. To hide the structure of the rpc_xprt's address field from ULPs and port mappers, add an API for setting the port number during an RPC bind operation. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked. Probably need to rig a server where certain services aren't running, or that returns an error for some typical operation. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:56 -05:00
Chuck Lever	35f5a422ce	SUNRPC: new interface to force an RPC rebind We'd like to hide fields in rpc_xprt and rpc_clnt from upper layer protocols. Start by creating an API to force RPC rebind, replacing logic that simply sets cl_port to zero. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked. Probably need to rig a server where certain services aren't running, or that returns an error for some typical operation. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:56 -05:00
Chuck Lever	0210714834	SUNRPC: switchable buffer allocation Add RPC client transport switch support for replacing buffer management on a per-transport basis. In the current IPv4 socket transport implementation, RPC buffers are allocated as needed for each RPC message that is sent. Some transport implementations may choose to use pre-allocated buffers for encoding, sending, receiving, and unmarshalling RPC messages, however. For transports capable of direct data placement, the buffers can be carved out of a pre-registered area of memory rather than from a slab cache. Test-plan: Millions of fsx operations. Performance characterization with "sio" and "iozone". Use oprofile and other tools to look for significant regression in CPU utilization. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:55 -05:00
Adrian Bunk	fb459f45f7	SUNRPC: net/sunrpc/xdr.c: remove xdr_decode_string() This patch removes ths unused function xdr_decode_string(). Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Neil Brown <neilb@suse.de> Acked-by: Charles Lever <Charles.Lever@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:53 -05:00
Trond Myklebust	969b7f2522	SUNRPC: Fix a potential race in rpc_pipefs. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:51 -05:00
Trond Myklebust	2bd615797e	SUNRPC: Ensure that SIGKILL will always terminate a synchronous RPC call. ...and make sure that the "intr" flag also enables SIGHUP and SIGTERM to interrupt RPC calls too (as per the Solaris implementation). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:45 -05:00
Trond Myklebust	e60859ac0e	SUNRPC: rpc_execute should not return task->tk_status; Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:42 -05:00
Trond Myklebust	89991c24e4	SUNRPC: Get rid of some unused exports Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:41 -05:00
Trond Myklebust	44c288732f	NFSv4: stateful NFSv4 RPC call interface The NFSv4 model requires us to complete all RPC calls that might establish state on the server whether or not the user wants to interrupt it. We may also need to schedule new work (including new RPC calls) in order to cancel the new state. The asynchronous RPC model will allow us to ensure that RPC calls always complete, but in order to allow for "synchronous" RPC, we want to add the ability to wait for completion. The waits are, of course, interruptible. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:40 -05:00
Trond Myklebust	4ce70ada1f	SUNRPC: Further cleanups Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:40 -05:00
Trond Myklebust	963d8fe533	RPC: Clean up RPC task structure Shrink the RPC task structure. Instead of storing separate pointers for task->tk_exit and task->tk_release, put them in a structure. Also pass the user data pointer as a parameter instead of passing it via task->tk_calldata. This enables us to nest callbacks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:39 -05:00
Trond Myklebust	abbcf28f23	SUNRPC: Yet more RPC cleanups Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:39 -05:00
Olaf Kirch	93fbf1a5de	[PATCH] Keep nfsd from exiting when seeing recv() errors I submitted this one previously - svc_tcp_recvfrom currently returns any errors to the caller, including ECONNRESET and the like. This is something svc_recv isn't able to deal with: len = svsk->sk_recvfrom(rqstp); [...] if (len == 0 \|\| len == -EAGAIN) { [...] return -EAGAIN; } [...] return len; The nfsd main loop will exit when it sees an error code other than EAGAIN. The following patch fixes this problem svc_recv is not equipped to deal with error codes other than EAGAIN, and will propagate anything else (such as ECONNRESET) up to nfsd, causing it to exit. Signed-off-by: Olaf Kirch <okir@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-06 08:33:59 -08:00
NeilBrown	1f1e030bf7	[PATCH] knfsd: fix hash function for IP addresses on 64bit little-endian machines. The hash.h hash_long function, when used on a 64 bit machine, ignores many of the middle-order bits. (The prime chosen it too bit-sparse). IP addresses for clients of an NFS server are very likely to differ only in the low-order bits. As addresses are stored in network-byte-order, these bits become middle-order bits in a little-endian 64bit 'long', and so do not contribute to the hash. Thus you can have the situation where all clients appear on one hash chain. So, until hash_long is fixed (or maybe forever), us a hash function that works well on IP addresses - xor the bytes together. Thanks to "Iozone" <capps@iozone.org> for identifying this problem. Cc: "Iozone" <capps@iozone.org> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-06 08:33:21 -08:00
Kris Katterjohn	46f25dffba	[NET]: Change 1500 to ETH_DATA_LEN in some files These patches add the header linux/if_ether.h and change 1500 to ETH_DATA_LEN in some files. Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 16:48:56 -08:00
Andrew Morton	e924283bf9	[IPVS]: Another file needs linux/interrupt.h Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 16:48:55 -08:00
Yasuyuki Kozakai	e8eaedf2f8	[NETFILTER]: Use HOPLIMIT metric as TTL of TCP reset sent by REJECT HOPLIMIT metric is appropriate to TCP reset sent by REJECT target than hard-coded max TTL. Thanks to David S. Miller for hint. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:28:57 -08:00
Patrick McHardy	0ae2cfe7f3	[NETFILTER]: nf_conntrack_l3proto_ipv4.c needs net/route.h CC [M] net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.o net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c: In function 'ipv4_refrag': net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c:198: error: dereferencing pointer to incomplete type make[3]: *** [net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.o] Error 1 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:21:52 -08:00
Patrick McHardy	22dea562bb	[NETFILTER]: Export ip6_masked_addrcmp, don't pass IPv6 addresses on stack Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:21:34 -08:00
Patrick McHardy	b777e0ce74	[NETFILTER]: make ipv6_find_hdr() find transport protocol header The original ipv6_find_hdr() finds the specified header in IPv6 packets. This makes it possible to get transport header so that we can kill similar loop in ip6_match_packet(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:21:16 -08:00
Patrick McHardy	1bd9bef6f9	[NETFILTER]: Call POST_ROUTING hook before fragmentation Call POST_ROUTING hook before fragmentation to get rid of the okfn use in ip_refrag and save the useless fragmentation/defragmentation step when NAT is used. The patch introduces one user-visible change, the POSTROUTING chain in the mangle table gets entire packets, not fragments, which should simplify use of the MARK and CLASSIFY targets for queueing as a nice side-effect. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:20:59 -08:00
Patrick McHardy	abbcc73982	[NETFILTER]: Remove okfn usage in ip_vs_core.c okfn should only be used from different contexts to avoid deep call chains, i.e. by nf_queue. Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:20:40 -08:00
Patrick McHardy	a9b305c4e5	[NETFILTER]: ctnetlink: Fix dumping of helper name Properly dump the helper name instead of internal kernel data. Based on patch by Marcus Sundberg <marcus@ingate.com>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:20:02 -08:00
Patrick McHardy	e7be6994ec	[NETFILTER]: Fix module_param types and permissions Fix netfilter module_param types and permissions. Also fix an off-by-one in the ipt_ULOG nlbufsiz < 128k check. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:19:46 -08:00
Pablo Neira Ayuso	87711cb81c	[NETFILTER]: Filter dumped entries based on the layer 3 protocol number Dump entries of a given Layer 3 protocol number. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:19:23 -08:00
Pablo Neira Ayuso	c1d10adb4a	[NETFILTER]: Add ctnetlink port for nf_conntrack Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:19:05 -08:00
Pablo Neira Ayuso	205d67c7d9	[NETFILTER]: ctnetlink: remove unused variable Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:18:44 -08:00
Pablo Neira Ayuso	d4d6bb41e0	[NETFILTER]: ctnetlink: fix conntrack mark race Set conntrack mark before it is in hashes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:18:25 -08:00
Pablo Neira Ayuso	0368309cb4	[NETFILTER]: ctnetlink: ctnetlink_event cleanup Cleanup: Use 'else if' instead of a ugly 'goto' statement. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:18:08 -08:00
Pablo Neira Ayuso	47116eb201	[NETFILTER]: ctnetlink: use u_int32_t instead of unsigned int Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:17:50 -08:00
Pablo Neira Ayuso	984955b3d7	[NETFILTER]: ctnetlink: propagate ctnetlink_dump_tuples_proto return value back Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:17:29 -08:00
Yasuyuki Kozakai	90c4656eb4	[NETFILTER]: ctnetlink: Add sanity checkings for ICMP Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:17:03 -08:00
Pablo Neira Ayuso	684f7b296c	[NETFILTER]: ctnetlink: remove bogus checks in ICMP protocol at dumping Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:16:41 -08:00
Jesper Juhl	d695aa8a1f	[NETFILTER]: Decrease number of pointer derefs in nf_conntrack_core.c Benefits of the patch: - Fewer pointer dereferences should make the code slightly faster. - Size of generated code is smaller - improved readability Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:16:16 -08:00
Jesper Juhl	3e4ead4fe5	[NETFILTER]: Decrease number of pointer derefs in nfnetlink_queue.c Benefits of the patch: - Fewer pointer dereferences should make the code slightly faster. - Size of generated code is smaller - improved readability Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:15:58 -08:00
Adrian Bunk	4ffd2e4907	[IPVS]: Fix compilation Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-05 12:14:43 -08:00
Linus Torvalds	db9edfd7e3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6 Trivial manual merge fixup for usb_find_interface clashes.	2006-01-04 18:44:12 -08:00
Linus Torvalds	52347f4e81	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial	2006-01-04 16:34:57 -08:00
Linus Torvalds	d779188d2b	Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6	2006-01-04 16:31:56 -08:00
Kay Sievers	fd586bacf4	[PATCH] net: swich device attribute creation to default attrs Recent udev versions don't longer cover bad sysfs timing with built-in logic. Explicit rules are required to do that. For net devices, the following is needed: ACTION=="add", SUBSYSTEM=="net", WAIT_FOR_SYSFS="address" to handle access to net device properties from an event handler without races. This patch changes the main net attributes to be created by the driver core, which is done _before_ the event is sent out and will not require the stat() loop of the WAIT_FOR_SYSFS key. Signed-off-by: Kay Sievers <kay.sievers@suse.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-01-04 16:18:10 -08:00
Kay Sievers	312c004d36	[PATCH] driver core: replace "hotplug" by "uevent" Leave the overloaded "hotplug" word to susbsystems which are handling real devices. The driver core does not "plug" anything, it just exports the state to userspace and generates events. Signed-off-by: Kay Sievers <kay.sievers@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-01-04 16:18:08 -08:00
Thomas Young	74cb879822	[TCP] tcp_vegas: Fix slow start Vegas' slow start was only adding one MSS per RTT rather than one for every ack. Slow start behavior should now match Reno. Signed-off-by: Thomas Young <tyo@ee.mu.oz.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-04 13:59:32 -08:00
Kris Katterjohn	9369986306	[NET]: More instruction checks fornet/core/filter.c Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-04 13:58:36 -08:00
YOSHIFUJI Hideaki	181a46a56e	[NETFILTER]: Use macro for spinlock_t/rwlock_t initializations/definition. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-04 13:56:54 -08:00
YOSHIFUJI Hideaki	196433c5b7	[IPV6]: Use macro for rwlock_t initialization. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-04 13:56:31 -08:00
YOSHIFUJI Hideaki	ca40330248	[ECONET]: Use macro for spinlock_t definition. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-04 13:56:08 -08:00
Arnaldo Carvalho de Melo	f190055ff5	[IPVS]: Add missing include <linux/net.h> CC [M] net/ipv4/ipvs/ip_vs_conn.o /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c: In function 'ip_vs_conn_new': /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c:606: warning: implicit declaration of function 'net_ratelimit' /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c: In function 'ip_vs_random_dropentry': /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c:810: warning: implicit declaration of function 'net_random' Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-01-04 02:02:20 -02:00
Arnaldo Carvalho de Melo	80e40daa47	[TCP]: syn_flood_warning is only needed if CONFIG_SYN_COOKIES is selected CC net/ipv4/tcp_ipv4.o /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/tcp_ipv4.c:665: warning: 'syn_flood_warning' defined but not used Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-01-04 01:58:06 -02:00
Arnaldo Carvalho de Melo	e4dfd449c8	[DCCP] ackvec: use u8 for the buf offsets Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-01-04 01:46:34 -02:00
Andrea Bittau	6742bbcbb8	[DCCP] ackvec: Fix spelling of "throw" Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-01-04 01:45:17 -02:00
Stephen Hemminger	40efc6fa17	[TCP]: less inline's TCP inline usage cleanup: * get rid of inline in several places * replace __inline__ with inline where possible * move functions used in one file out of tcp.h * let compiler decide on used once cases On x86_64: text data bss dec hex filename 3594701 648348 567400 4810449 4966d1 vmlinux.orig 3593133 648580 567400 4809113 496199 vmlinux On sparc64: text data bss dec hex filename 2538278 406152 530392 3474822 350586 vmlinux.ORIG `2536382` 406384 530392 3473158 34ff06 vmlinux Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 16:03:49 -08:00
Stephen Hemminger	3c19065a1e	[IEEE80211] ipw2200: Simplify multicast checks. From: Stephen Hemminger <shemminger@osdl.org> is_multicast_ether_addr() accepts broadcast too, so the is_broadcast_ether_addr() calls are redundant. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 15:27:38 -08:00
Stephen Hemminger	cd8787ab04	[IPV4] fib_trie: build fix Need this to fix build of fib_trie in net-2.6.16 (rebased) tree. The code needs the new inet_make_mask inline. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 14:38:34 -08:00
Stephen Hemminger	554c9a8ec3	[BRIDGE]: Fix faulty check in br_stp_recalculate_bridge_id() One of the conversions from memcmp to compare_ether_addr is incorrect. We need to do relative comparison to determine min MAC address to use in bridge id. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 14:35:54 -08:00
Andrea Bittau	e84a9f5e9c	[DCCP]: Notify CCID only after ACK vectors have been processed. The CCID should be notified of packet reception only when a packet is valid. Therefore, the ACK vector needs to be processed before notifying the CCID. Also, the CCID might need information provided by the ACK vector. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 14:26:15 -08:00
Andrea Bittau	9e377202d2	[DCCP]: Send an ACK vector when ACKing a response packet If ACK vectors are used, each packet with an ACK should contain an ACK vector. The only exception currently is response packets. It probably is not a good idea to store ACK vector state before the connection is completed (to help protect from syn floods). Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 14:25:49 -08:00
Andrea Bittau	709dd3aaf5	[DCCP]: Do not process a packet twice when it's not in state DCCP_OPEN. When packets are received, the connection is either in DCCP_OPEN [fast-path] or it isn't. If it's not [e.g. DCCP_PARTOPEN] upper layers will perform sanity checks and parse options. If it is in DCCP_OPEN, dccp_rcv_established() will do it. It is important not to re-parse options in dccp_rcv_established() when it is not called from the fast-path. Else, fore example, the ack vector will be added twice and the CCID will see the packet twice. The solution is to always enfore sanity checks from the upper layers. When packets arrive in the fast-path, sanity checks will be performed before calling dccp_rcv_established(). Note(acme): I rewrote the patch to achieve the same result but keeping dccp_rcv_established with the previous semantics and having it split into __dccp_rcv_established, that doesn't does do any sanity check, code in state != DCCP_OPEN use this lighter version as they already do the sanity checks. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 14:25:17 -08:00
Patrick Caulfield	5062430c5c	[DECNET]: Only use local routers The attached patch makes DECnet routing only use routers from the same area - rather than the highest rated router seen. In theory there should not be an out-of-area router on a local network but some networks are bridged rather than properly routed. VMS seems to behave similarly: if I bring up a VMS node with no router then it can't see anything else on the global network. Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 14:24:02 -08:00
Roberto Nibali	4b5bdf5cc3	[IPVS]: Cleanup IP_VS_DBG statements. From: Roberto Nibali <ratz@drugphish.ch> The attached patch (against current -GIT) is a cleanup patch which does following: o lookup debug messages shifted back to 9 o added more informational value to flags and refcnt since those entries can be in multiple referenced structures o cleanup 80 char violation It's the prepatch to the session pool implementation and helps very much to debug and monitor important variables and structures regarding the threshold limitation and persistency without the thousands of lookup messages which noone is interested in. Signed-off-by: Horms <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 14:22:59 -08:00
Christoph Hellwig	b5e5fa5e09	[NET]: Add a dev_ioctl() fallback to sock_ioctl() Currently all network protocols need to call dev_ioctl as the default fallback in their ioctl implementations. This patch adds a fallback to dev_ioctl to sock_ioctl if the protocol returned -ENOIOCTLCMD. This way all the procotol ioctl handlers can be simplified and we don't need to export dev_ioctl. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 14:18:33 -08:00
Christoph Hellwig	5ff7630e4a	[NETROM]: Remove unessecary lock_sock calls in netrom_ioctl() lock_sock is needed only in very few cases, so do it there instead of around the switch statement. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 14:14:46 -08:00
Per Liden	b461d2f218	[NETLINK] genetlink: fix cmd type in genl_ops to be consistent to u8 Signed-off-by: Per Liden <per.liden@ericsson.com> ACKed-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 14:13:29 -08:00
Benjamin LaHaise	fd19f329a3	[AF_UNIX]: Convert to use a spinlock instead of rwlock From: Benjamin LaHaise <bcrl@kvack.org> In af_unix, a rwlock is used to protect internal state. At least on my P4 with HT it is faster to use a spinlock due to the simpler memory barrier used to unlock. This patch raises bw_unix to ~690K/s. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 14:10:46 -08:00
Benjamin LaHaise	4947d3ef8d	[NET]: Speed up __alloc_skb() From: Benjamin LaHaise <bcrl@kvack.org> In __alloc_skb(), the use of skb_shinfo() which casts a u8 * to the shared info structure results in gcc being forced to do a reload of the pointer since it has no information on possible aliasing. Fix this by using a pointer to refer to skb_shared_info. By initializing skb_shared_info sequentially, the write combining buffers can reduce the number of memory transactions to a single write. Reorder the initialization in __alloc_skb() to match the structure definition. There is also an alignment issue on 64 bit systems with skb_shared_info by converting nr_frags to a short everything packs up nicely. Also, pass the slab cache pointer according to the fclone flag instead of using two almost identical function calls. This raises bw_unix performance up to a peak of 707KB/s when combined with the spinlock patch. It should help other networking protocols, too. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 14:06:50 -08:00
Arnaldo Carvalho de Melo	14c850212e	[INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h To help in reducing the number of include dependencies, several files were touched as they were getting needed headers indirectly for stuff they use. Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had linux/dccp.h include twice. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:21 -08:00
Arnaldo Carvalho de Melo	25995ff577	[SOCK]: Introduce sk_receive_skb Its common enough to to justify that, TCP still can't use it as it has the prequeueing stuff, still to be made generic in the not so distant future :-) Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:19 -08:00
Christoph Hellwig	ce1d4d3e88	[NET]: restructure sock_aio_{read,write} / sock_{readv,writev} Mid-term I plan to restructure the file_operations so that we don't need to have all these duplicate aio and vectored versions. This patch is a small step in that direction but also a worthwile cleanup on it's own: (1) introduce a alloc_sock_iocb helper that encapsulates allocating a proper sock_iocb (2) add do_sock_read and do_sock_write helpers for common read/write code Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:18 -08:00
David S. Miller	cbeb321a64	[NET]: Fix sock_init() return value. It needs to return zero now that it is an initcall. Also, net/nonet.c no longer needs a dummy sock_init(). Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:17 -08:00
Jaco Kroon	f34fbb9713	[PKTGEN]: Deinitialise static variables. static variables should not be explicitly initialised to 0. This causes them to be placed in .data instead of .bss. This patch de-initialises 3 static variables in net/core/pktgen.c. There are approximately 800 more such variables in the source tree (2.6.15rc5). If there is more interrest I'd be willing to track down the rest of these as well and de-initialise them as well. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:16 -08:00
Eric Dumazet	90ddc4f047	[NET]: move struct proto_ops to const I noticed that some of 'struct proto_ops' used in the kernel may share a cache line used by locks or other heavily modified data. (default linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at least) This patch makes sure a 'struct proto_ops' can be declared as const, so that all cpus can share all parts of it without false sharing. This is not mandatory : a driver can still use a read/write structure if it needs to (and eventually a __read_mostly) I made a global stubstitute to change all existing occurences to make them const. This should reduce the possibility of false sharing on SMP, and speedup some socket system calls. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:15 -08:00
Andi Kleen	77d76ea310	[NET]: Small cleanup to socket initialization sock_init can be done as a core_initcall instead of calling it directly in init/main.c Also I removed an out of date #ifdef. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:14 -08:00
Frank Filz	7708610b1b	[SCTP]: Add support for SCTP_DELAYED_ACK_TIME socket option. Signed-off-by: Frank Filz <ffilz@us.ibm.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:13 -08:00
Frank Filz	52ccb8e90c	[SCTP]: Update SCTP_PEER_ADDR_PARAMS socket option to the latest api draft. This patch adds support to set/get heartbeat interval, maximum number of retransmissions, pathmtu, sackdelay time for a particular transport/ association/socket as per the latest SCTP sockets api draft11. Signed-off-by: Frank Filz <ffilz@us.ibm.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:11 -08:00
Robert Olsson	fd9662555c	[IPV4] fib_trie: Add credits. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:10 -08:00
Stephen Hemminger	9eb2d62719	[TCP] cubic: use Newton-Raphson Replace cube root algorithim with a faster version using Newton-Raphson. Surprisingly, doing the scaled div64_64 is faster than a true 64 bit division on 64 bit CPU's. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:09 -08:00
Stephen Hemminger	89b3d9aaf4	[TCP] cubic: precompute constants Revised version of patch to pre-compute values for TCP cubic. * d32,d64 replaced with descriptive names * cube_factor replaces srtt[scaled by count] / HZ * ((1 << (10+2BICTCP_HZ)) / bic_scale) beta_scale replaces 8*(BICTCP_BETA_SCALE+beta)/3/(BICTCP_BETA_SCALE-beta); Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:08 -08:00
Stephen Hemminger	c865e5d99e	[PKT_SCHED] netem: packet corruption option Here is a new feature for netem in 2.6.16. It adds the ability to randomly corrupt packets with netem. A version was done by Hagen Paul Pfeifer, but I redid it to handle the cases of backwards compatibility with netlink interface and presence of hardware checksum offload. It is useful for testing hardware offload in devices. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:05 -08:00
Stephen Hemminger	8cbb512e50	[BRIDGE]: add version number Add version info to bridge module. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:04 -08:00
Stephen Hemminger	edb5e46fc0	[BRIDGE]: limited ethtool support Add limited ethtool support to bridge to allow disabling features. Note: if underlying device does not support a feature (like checksum offload), then the bridge device won't inherit it. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:03 -08:00
Stephen Hemminger	0e5eabac49	[BRIDGE]: filter packets in learning state While in the learning state, run filters but drop the result. This prevents us from acquiring bad fdb entries in learning state. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:02 -08:00
Stephen Hemminger	4433f420e5	[BRIDGE]: handle speed detection after carrier changes Speed of a interface may not be available until carrier is detected in the case of autonegotiation. To get the correct value we need to recheck speed after carrier event. But the check needs to be done in a context that is similar to normal ethtool interface (can sleep). Also, delay check for 1ms to try avoid any carrier bounce transitions. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:01 -08:00
Stephen Hemminger	4505a3ef72	[BRIDGE]: allow setting hardware address of bridge pseudo-dev Some people are using bridging to hide multiple machines from an ISP that restricts by MAC address. So in that case allow the bridge mac address to be set to any of the existing interfaces. I don't want to allow any arbitrary value and confuse STP. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:00 -08:00
David S. Miller	fbe9cc4a87	[AF_UNIX]: Use spinlock for unix_table_lock This lock is actually taken mostly as a writer, so using a rwlock actually just makes performance worse especially on chips like the Intel P4. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:59 -08:00
Arnaldo Carvalho de Melo	d83d8461f9	[IP_SOCKGLUE]: Remove most of the tcp specific calls As DCCP needs to be called in the same spots. Now we have a member in inet_sock (is_icsk), set at sock creation time from struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and DCCP) to see if a struct sock instance is a inet_connection_sock for places like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if sk_type was SOCK_STREAM, that is insufficient because we now use the same code for DCCP, that has sk_type SOCK_DCCP. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:58 -08:00
Arnaldo Carvalho de Melo	d8313f5ca2	[INET6]: Generalise tcp_v6_hash_connect Renaming it to inet6_hash_connect, making it possible to ditch dccp_v6_hash_connect and share the same code with TCP instead. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:56 -08:00
Arnaldo Carvalho de Melo	a7f5e7f164	[INET]: Generalise tcp_v4_hash_connect Renaming it to inet_hash_connect, making it possible to ditch dccp_v4_hash_connect and share the same code with TCP instead. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:55 -08:00
Arnaldo Carvalho de Melo	6d6ee43e0b	[TWSK]: Introduce struct timewait_sock_ops So that we can share several timewait sockets related functions and make the timewait mini sockets infrastructure closer to the request mini sockets one. Next changesets will take advantage of this, moving more code out of TCP and DCCP v4 and v6 to common infrastructure. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:54 -08:00
Arnaldo Carvalho de Melo	fc44b98053	[DCCP]: Use reqsk_free in dccp_v4_conn_request Now we have the destructor (dccp_v4_reqsk_destructor) in our request_sock_ops vtable. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:53 -08:00
Arnaldo Carvalho de Melo	3df80d9320	[DCCP]: Introduce DCCPv6 Still needs mucho polishing, specially in the checksum code, but works just fine, inet_diag/iproute2 and all 8) Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:52 -08:00
Arnaldo Carvalho de Melo	399c07def6	[IPV6]: Export ipv6_opt_accepted It was already non-TCP specific, will be used by DCCPv6. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:51 -08:00
Arnaldo Carvalho de Melo	f21e68caa0	[DCCP]: Prepare the AF agnostic core for the introduction of DCCPv6 Basically exports a similar set of functions as the one exported by the non-AF specific TCP code. In the process moved some non-AF specific code from dccp_v4_connect to dccp_connect_init and moved the checksum verification from dccp_invalid_packet to dccp_v4_rcv, so as to use it in dccp_v6_rcv too. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:50 -08:00
Arnaldo Carvalho de Melo	34ca686081	[DCCP]: Just rename dccp_v4_prot to dccp_prot To match TCP equivalent. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:49 -08:00
Arnaldo Carvalho de Melo	3cf3dc6c2e	[IPV6]: Export some symbols for DCCPv6 Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:48 -08:00
Arnaldo Carvalho de Melo	0fa1a53e1f	[IPV6]: Introduce inet6_timewait_sock Out of tcp6_timewait_sock, that now is just an aggregation of inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct inet_timewait_sock, that is common to the IPv6 transport protocols that use timewait sockets, like DCCP and TCP. tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic code to find the IPv6 area in a timewait sock. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:47 -08:00
Arnaldo Carvalho de Melo	b9750ce13c	[IPV6]: Generalise some functions Using sk->sk_protocol instead of IPPROTO_TCP. Will be used by DCCPv6 in the next changesets. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:46 -08:00
Benjamin LaHaise	830a1e5c21	[AF_UNIX]: Remove superfluous reference counting in unix_stream_sendmsg AF_UNIX stream socket performance on P4 CPUs tends to suffer due to a lot of pipeline flushes from atomic operations. The patch below removes the sock_hold() and sock_put() in unix_stream_sendmsg(). This should be safe as the socket still holds a reference to its peer which is only released after the file descriptor's final user invokes unix_release_sock(). The only consideration is that we must add a memory barrier before setting the peer initially. Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:45 -08:00
Benjamin LaHaise	c1cbe4b7ad	[NET]: Avoid atomic xchg() for non-error case It also looks like there were 2 places where the test on sk_err was missing from the event wait logic (in sk_stream_wait_connect and sk_stream_wait_memory), while the rest of the sock_error() users look to be doing the right thing. This version of the patch fixes those, and cleans up a few places that were testing ->sk_err directly. Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:44 -08:00
Roberto Nibali	f1f71e03b1	[IPVS]: remove dead code This patch removes dead code. I don't see the reason to keep this cruft around, besides cluttering the nice and functionally working code. Signed-off-by: Roberto Nibali <ratz@drugphish.ch> Signed-off-by: Horms <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:43 -08:00
Stephen Hemminger	65a45441d7	[UDP]: udp_checksum_init return value Since udp_checksum_init always returns 0 there is no point in having it return a value. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:42 -08:00
Herbert Xu	3305b80c21	[IP]: Simplify and consolidate MSG_PEEK error handling When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled it is left on the socket receive queue. This means that when we detect a checksum error we have to be careful when trying to free the packet as someone could have dequeued it in the time being. Currently this delicate logic is duplicated three times between UDPv4, UDPv6 and RAWv6. This patch moves them into a one place and simplifies the code somewhat. This is based on a suggestion by Eric Dumazet. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:41 -08:00
Arnaldo Carvalho de Melo	57cca05af1	[DCCP]: Introduce dccp_ipv4_af_ops And make the core DCCP code AF agnostic, just like TCP, now its time to work on net/dccp/ipv6.c, we are close to the end! Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:40 -08:00
Arnaldo Carvalho de Melo	af05dc9394	[ICSK]: Move v4_addr2sockaddr from TCP to icsk Renaming it to inet_csk_addr2sockaddr. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:39 -08:00
Arnaldo Carvalho de Melo	8292a17a39	[ICSK]: Rename struct tcp_func to struct inet_connection_sock_af_ops And move it to struct inet_connection_sock. DCCP will use it in the upcoming changesets. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:38 -08:00
Arnaldo Carvalho de Melo	ca304b6104	[IPV6]: Introduce inet6_rsk() And inet6_rsk_offset in inet_request_sock, for the same reasons as inet_sock's pinfo6 member. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:37 -08:00
Arnaldo Carvalho de Melo	8129765ac0	[IPV6]: Generalise tcp_v6_search_req & tcp_v6_synq_add More work is needed tho to introduce inet6_request_sock from tcp6_request_sock, in the same layout considerations as ipv6_pinfo in inet_sock, next changeset will do that. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:36 -08:00
Arnaldo Carvalho de Melo	c2977c2213	[ICSK]: make inet_csk_reqsk_queue_hash_add timeout arg unsigned long Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:34 -08:00
Arnaldo Carvalho de Melo	90b19d3169	[IPV6]: Generalise __tcp_v6_hash, renaming it to __inet6_hash Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:33 -08:00
Arnaldo Carvalho de Melo	971af18bbf	[IPV6]: Reuse inet_csk_get_port in tcp_v6_get_port Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:33 -08:00
Herbert Xu	89cee8b1cb	[IPV4]: Safer reassembly Another spin of Herbert Xu's "safer ip reassembly" patch for 2.6.16. (The original patch is here: http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2 and my only contribution is to have tested it.) This patch (optionally) does additional checks before accepting IP fragments, which can greatly reduce the possibility of reassembling fragments which originated from different IP datagrams. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arthur Kepner <akepner@sgi.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:31 -08:00
Bart De Schuymer	d5228a4f49	[NETFILTER] ebtables: Support nf_log API from ebt_log and ebt_ulog This makes ebt_log and ebt_ulog use the new nf_log api. This enables the bridging packet filter to log packets e.g. via nfnetlink_log. Signed-off-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:30 -08:00
Eric Dumazet	3183606469	[NETFILTER] ip_tables: NUMA-aware allocation Part of a performance problem with ip_tables is that memory allocation is not NUMA aware, but 'only' SMP aware (ie each CPU normally touch separate cache lines) Even with small iptables rules, the cost of this misplacement can be high on common workloads. Instead of using one vmalloc() area (located in the node of the iptables process), we now allocate an area for each possible CPU, using vmalloc_node() so that memory should be allocated in the CPU's node if possible. Port to arp_tables and ip6_tables by Harald Welte. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:29 -08:00
Stephen Hemminger	df3271f336	[TCP] BIC: CUBIC window growth (2.0) Replace existing BIC version 1.1 with new version 2.0. The main change is to replace the window growth function with a cubic function as described in: http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/cubic-paper.pdf Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:28 -08:00
Stephen Hemminger	05d054503a	[TCP] BIC: spelling and whitespace Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:27 -08:00
Stephen Hemminger	018da8f44c	[TCP] BIC: remove low utilization code. The latest BICTCP patch at: http://www.csc.ncsu.edu:8080/faculty/rhee/export/bitcp/index_files/Page546.htm disables the low_utilization feature of BICTCP because it doesn't work in some cases. This patch removes it. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:26 -08:00
Trent Jaeger	df71837d50	[LSM-IPSec]: Security association restriction. This patch series implements per packet access control via the extension of the Linux Security Modules (LSM) interface by hooks in the XFRM and pfkey subsystems that leverage IPSec security associations to label packets. Extensions to the SELinux LSM are included that leverage the patch for this purpose. This patch implements the changes necessary to the XFRM subsystem, pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a socket to use only authorized security associations (or no security association) to send/receive network packets. Patch purpose: The patch is designed to enable access control per packets based on the strongly authenticated IPSec security association. Such access controls augment the existing ones based on network interface and IP address. The former are very coarse-grained, and the latter can be spoofed. By using IPSec, the system can control access to remote hosts based on cryptographic keys generated using the IPSec mechanism. This enables access control on a per-machine basis or per-application if the remote machine is running the same mechanism and trusted to enforce the access control policy. Patch design approach: The overall approach is that policy (xfrm_policy) entries set by user-level programs (e.g., setkey for ipsec-tools) are extended with a security context that is used at policy selection time in the XFRM subsystem to restrict the sockets that can send/receive packets via security associations (xfrm_states) that are built from those policies. A presentation available at www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf from the SELinux symposium describes the overall approach. Patch implementation details: On output, the policy retrieved (via xfrm_policy_lookup or xfrm_sk_policy_lookup) must be authorized for the security context of the socket and the same security context is required for resultant security association (retrieved or negotiated via racoon in ipsec-tools). This is enforced in xfrm_state_find. On input, the policy retrieved must also be authorized for the socket (at __xfrm_policy_check), and the security context of the policy must also match the security association being used. The patch has virtually no impact on packets that do not use IPSec. The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as before. Also, if IPSec is used without security contexts, the impact is minimal. The LSM must allow such policies to be selected for the combination of socket and remote machine, but subsequent IPSec processing proceeds as in the original case. Testing: The pfkey interface is tested using the ipsec-tools. ipsec-tools have been modified (a separate ipsec-tools patch is available for version 0.5) that supports assignment of xfrm_policy entries and security associations with security contexts via setkey and the negotiation using the security contexts via racoon. The xfrm_user interface is tested via ad hoc programs that set security contexts. These programs are also available from me, and contain programs for setting, getting, and deleting policy for testing this interface. Testing of sa functions was done by tracing kernel behavior. Signed-off-by: Trent Jaeger <tjaeger@cse.psu.edu> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:24 -08:00
Jeff Garzik	ac67c62473	Merge branch 'master'	2006-01-03 10:49:18 -05:00
Matt Mackall	4a4efbdee2	s/retreiv/retriev/g As everyone knows, the rule is: "i before e.. um.. always." Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-01-03 13:27:11 +01:00
David L Stevens	5ab4a6c81e	[IPV6] mcast: Fix multiple issues in MLDv2 reports. The below "jumbo" patch fixes the following problems in MLDv2. 1) Add necessary "ntohs" to recent "pskb_may_pull" check [breaks all nonzero source queries on little-endian (!)] 2) Add locking to source filter list [resend of prior patch] 3) fix "mld_marksources()" to a) send nothing when all queried sources are excluded b) send full exclude report when source queried sources are not excluded c) don't schedule a timer when there's nothing to report NOTE: RFC 3810 specifies the source list should be saved and each source reported individually as an IS_IN. This is an obvious DOS path, requiring the host to store and then multicast as many sources as are queried (e.g., millions...). This alternative sends a full, relevant report that's limited to number of sources present on the machine. 4) fix "add_grec()" to send empty-source records when it should The original check doesn't account for a non-empty source list with all sources inactive; the new code keeps that short-circuit case, and also generates the group header with an empty list if needed. 5) fix mca_crcount decrement to be after add_grec(), which needs its original value These issues (other than item #1 ;-) ) were all found by Yan Zheng, much thanks! Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-27 14:03:00 -08:00
David S. Miller	1b93ae64ca	[NET]: Validate socket filters against BPF_MAXINSNS in one spot. Currently the checks are scattered all over and this leads to inconsistencies and even cases where the check is not made. Based upon a patch from Kris Katterjohn. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-27 13:57:59 -08:00
YOSHIFUJI Hideaki	6732badee0	[IPV6]: Fix addrconf dead lock. We need to release idev->lcok before we call addrconf_dad_stop(). It calls ipv6_addr_del(), which will hold idev->lock. Bug spotted by Yasuyuki KOZAKAI <yasuyuki.kozakai@toshiba.co.jp>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-27 13:35:15 -08:00
David Kimdon	79cac2a221	[BR_NETFILTER]: Fix leak if skb traverses > 1 bridge Call nf_bridge_put() before allocating a new nf_bridge structure and potentially overwriting the pointer to a previously allocated one. This fixes a memory leak which can occur when the bridge topology allows for an skb to traverse more than one bridge. Signed-off-by: David Kimdon <david.kimdon@devicescape.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-26 17:27:10 -08:00
David L Stevens	6f4353d891	[IPV6]: Increase default MLD_MAX_MSF to 64. The existing default of 10 is just way too low. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-26 17:03:46 -08:00
Hiroyuki YAMAMORI	291d809ba5	[IPV6]: Fix Temporary Address Generation From: Hiroyuki YAMAMORI <h-yamamo@db3.so-net.ne.jp> Since regen_count is stored in the public address, we need to reset it when we start renewing temporary address. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-23 11:24:05 -08:00
YOSHIFUJI Hideaki	3dd3bf8357	[IPV6]: Fix dead lock. We need to relesae ifp->lock before we call addrconf_dad_stop(), which will hold ifp->lock. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-23 11:23:21 -08:00
David S. Miller	e6469297d4	Merge git://git.skbuff.net/gitroot/yoshfuji/linux-2.6.14+git+ipv6-fix-20051221a	2005-12-22 07:41:27 -08:00
David S. Miller	9b78a82c1c	[IPSEC]: Fix policy updates missed by sockets The problem is that when new policies are inserted, sockets do not see the update (but all new route lookups do). This bug is related to the SA insertion stale route issue solved recently, and this policy visibility problem can be fixed in a similar way. The fix is to flush out the bundles of all policies deeper than the policy being inserted. Consider beginning state of "outgoing" direction policy list: policy A --> policy B --> policy C --> policy D First, realize that inserting a policy into a list only potentially changes IPSEC routes for that direction. Therefore we need not bother considering the policies for other directions. We need only consider the existing policies in the list we are doing the inserting. Consider new policy "B'", inserted after B. policy A --> policy B --> policy B' --> policy C --> policy D Two rules: 1) If policy A or policy B matched before the insertion, they appear before B' and thus would still match after inserting B' 2) Policy C and D, now "shadowed" and after policy B', potentially contain stale routes because policy B' might be selected instead of them. Therefore we only need flush routes assosciated with policies appearing after a newly inserted policy, if any. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-22 07:39:48 -08:00
Ian McDonald	4c7e689502	[DCCP]: Comment typo I hope to actually change this behaviour shortly but this will help anybody grepping code at present. Signed-off-by: Ian McDonald <imcdnzl@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-21 19:02:39 -08:00
Kristian Slavov	1d1428045c	[IPV6]: Fix address deletion If you add more than one IPv6 address belonging to the same prefix and delete the address that was last added, routing table entry for that prefix is also deleted. Tested on 2.6.14.4 To reproduce: ip addr add 3ffe::1/64 dev eth0 ip addr add 3ffe::2/64 dev eth0 /* wait DAD */ sleep 1 ip addr del 3ffe::2/64 dev eth0 ip -6 route (route to 3ffe::/64 should be gone) In ipv6_del_addr(), if ifa == ifp, we set ifa->if_next to NULL, and later assign ifap = &ifa->if_next, effectively terminating the for-loop. This prevents us from checking if there are other addresses using the same prefix that are valid, and thus resulting in deletion of the prefix. This applies only if the first entry in idev->addr_list is the address to be deleted. Signed-off-by: Kristian Slavov <kristian.slavov@nomadiclab.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-21 18:47:24 -08:00
Mika Kukkonen	7eb1b3d372	[VLAN]: Add two missing checks to vlan_ioctl_handler() In vlan_ioctl_handler() the code misses couple checks for error return values. Signed-off-by: Mika Kukkonen <mikukkon@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-21 18:39:49 -08:00
Mika Kukkonen	0d77d59f62	[NETROM]: Fix three if-statements in nr_state1_machine() I found these while compiling with extra gcc warnings; considering the indenting surely they are not intentional? Signed-off-by: Mika Kukkonen <mikukkon@iki.fi> Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-21 18:38:26 -08:00
YOSHIFUJI Hideaki	6b3ae80a63	[IPV6]: Don't select a tentative address as a source address. A tentative address is not considered "assigned to an interface" in the traditional sense (RFC2462 Section 4). Don't try to select such an address for the source address. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2005-12-21 22:58:01 +09:00
YOSHIFUJI Hideaki	c5e33bddd3	[IPV6]: Run DAD when the link becomes ready. If the link was not available when the interface was created, run DAD for pending tentative addresses when the link becomes ready. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2005-12-21 22:57:44 +09:00
YOSHIFUJI Hideaki	3c21edbd11	[IPV6]: Defer IPv6 device initialization until the link becomes ready. NETDEV_UP might be sent even if the link attached to the interface was not ready. DAD does not make sense in such case, so we won't do so. After interface Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2005-12-21 22:57:24 +09:00
YOSHIFUJI Hideaki	8de3351e6e	[IPV6]: Try not to send icmp to anycast address. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2005-12-21 22:57:06 +09:00
YOSHIFUJI Hideaki	58c4fb86ea	[IPV6]: Flag RTF_ANYCAST for anycast routes. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2005-12-21 22:56:42 +09:00
Trond Myklebust	48e4918775	SUNRPC: Fix "EPIPE" error on mount of rpcsec_gss-protected partitions gss_create_upcall() should not error just because rpc.gssd closed the pipe on its end. Instead, it should requeue the pending requests and then retry. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-12-19 23:12:21 -05:00
Trond Myklebust	b079fa7baa	RPC: Do not block on skb allocation If we get something like the following, [ 125.300636] [<c04086e1>] schedule_timeout+0x54/0xa5 [ 125.305931] [<c040866e>] io_schedule_timeout+0x29/0x33 [ 125.311495] [<c02880c4>] blk_congestion_wait+0x70/0x85 [ 125.317058] [<c014136b>] throttle_vm_writeout+0x69/0x7d [ 125.322720] [<c014714d>] shrink_zone+0xe0/0xfa [ 125.327560] [<c01471d4>] shrink_caches+0x6d/0x6f [ 125.332581] [<c01472a6>] try_to_free_pages+0xd0/0x1b5 [ 125.338056] [<c013fa4b>] __alloc_pages+0x135/0x2e8 [ 125.343258] [<c03b74ad>] tcp_sendmsg+0xaa0/0xb78 [ 125.348281] [<c03d4666>] inet_sendmsg+0x48/0x53 [ 125.353212] [<c0388716>] sock_sendmsg+0xb8/0xd3 [ 125.358147] [<c0388773>] kernel_sendmsg+0x42/0x4f [ 125.363259] [<c038bc00>] sock_no_sendpage+0x5e/0x77 [ 125.368556] [<c03ee7af>] xs_tcp_send_request+0x2af/0x375 then the socket is blocked until memory is reclaimed, and no progress can ever be made. Try to access the emergency pools by using GFP_ATOMIC. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-12-19 23:11:54 -05:00
Neil Horman	9bffc4ace1	[SCTP]: Fix sctp to not return erroneous POLLOUT events. Make sctp_writeable() use sk_wmem_alloc rather than sk_wmem_queued to determine the sndbuf space available. It also removes all the modifications to sk_wmem_queued as it is not currently used in SCTP. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-19 14:24:40 -08:00
David S. Miller	399c180ac5	[IPSEC]: Perform SA switchover immediately. When we insert a new xfrm_state which potentially subsumes an existing one, make sure all cached bundles are flushed so that the new SA is used immediately. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-19 14:23:23 -08:00
Patrick McHardy	9e999993c7	[XFRM]: Handle DCCP in xfrm{4,6}_decode_session Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-19 14:03:46 -08:00
YOSHIFUJI Hideaki	3dd4bc68fa	[IPV6]: Fix route lifetime. The route expiration time is stored in rt6i_expires in jiffies. The argument of rt6_route_add() for adding a route is not the expiration time in jiffies nor in clock_t, but the lifetime (or time left before expiration) in clock_t. Because of the confusion, we sometimes saw several strange errors (FAILs) in TAHI IPv6 Ready Logo Phase-2 Self Test. The symptoms were analyzed by Mitsuru Chinen <CHINEN@jp.ibm.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-19 14:02:45 -08:00
Bart De Schuymer	b03664869a	[BRIDGE-NF]: Fix bridge-nf ipv6 length check A typo caused some bridged IPv6 packets to get dropped randomly, as reported by Sebastien Chaumontet. The patch below fixes this (using skb->nh.raw instead of raw) and also makes the jumbo packet length checking up-to-date with the code in net/ipv6/exthdrs.c::ipv6_hop_jumbo. Signed-off-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-19 14:00:08 -08:00
Patrick McHardy	31cb5bd4dc	[NETFILTER]: Fix incorrect dependency for IP6_NF_TARGET_NFQUEUE IP6_NF_TARGET_NFQUEUE depends on IP6_NF_IPTABLES, not IP_NF_IPTABLES. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-19 13:53:26 -08:00
Patrick McHardy	0476f171af	[NETFILTER]: Fix NAT init order As noticed by Phil Oester, the GRE NAT protocol helper is initialized before the NAT core, which makes registration fail. Change the linking order to make NAT be initialized first. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-19 13:53:09 -08:00
Jeff Garzik	418fbfe979	Merge branch 'master'	2005-12-19 00:09:53 -05:00
Al Viro	d3a880e1ff	[PATCH] Address of void __user * is void __user * , not void __user * Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-12-15 10:04:31 -08:00
Stephen Hemminger	a388442c37	[VLAN]: Fix hardware rx csum errors Receiving VLAN packets over a device (without VLAN assist) that is doing hardware checksumming (CHECKSUM_HW), causes errors because the VLAN code forgets to adjust the hardware checksum. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-14 16:23:16 -08:00
Herbert Xu	1542272a60	[GRE]: Fix hardware checksum modification The skb_postpull_rcsum introduced a bug to the checksum modification. Although the length pulled is offset bytes, the origin of the pulling is the GRE header, not the IP header. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-14 12:55:24 -08:00
David S. Miller	2edc2689f8	[PKT_SCHED]: Disable debug tracing logs by default in packet action API. Noticed by Andi Kleen. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-13 22:59:50 -08:00
David S. Miller	a1493d9cd1	[IPV6] addrconf: Do not print device pointer in privacy log message. Noticed by Andi Kleen, it is pointless to emit the device structure pointer in the kernel logs like this. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-13 22:59:36 -08:00

1 2 3 4 5 ...

1485 Commits