OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Chuck Lever	b92503b25c	[PATCH] knfsd: SUNRPC: teach svc_sendto() to deal with IPv6 addresses CMSG_DATA comes in different sizes, depending on address family. [akpm@linux-foundation.org: remove unneeded do/while (0)] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Chuck Lever	73df0dbaff	[PATCH] knfsd: SUNRPC: Make rq_daddr field address-version independent The rq_daddr field must support larger addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Chuck Lever	27459f0940	[PATCH] knfsd: SUNRPC: Provide room in svc_rqst for larger addresses Expand the rq_addr field to allow it to contain larger addresses. Specifically, we replace a 'sockaddr_in' with a 'sockaddr_storage', then everywhere the 'sockaddr_in' was referenced, we use instead an accessor function (svc_addr_in) which safely casts the _storage to _in. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Chuck Lever	2442222283	[PATCH] knfsd: SUNRPC: Use sockaddr_storage to store address in svc_deferred_req Sockaddr_storage will allow us to store arbitrary socket addresses in the svc_deferred_req struct. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:35 -08:00
Chuck Lever	ad06e4bd62	[PATCH] knfsd: SUNRPC: Add a function to format the address in an svc_rqst for printing There are loads of places where the RPC server assumes that the rq_addr fields contains an IPv4 address. Top among these are error and debugging messages that display the server's IP address. Let's refactor the address printing into a separate function that's smart enough to figure out the difference between IPv4 and IPv6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:35 -08:00
Chuck Lever	1ba951053f	[PATCH] knfsd: SUNRPC: Don't set msg_name and msg_namelen when calling sock_recvmsg Clean-up: msg_name and msg_namelen are not used by sock_recvmsg, so don't bother to set them in svc_recvfrom. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:35 -08:00
Chuck Lever	067d781731	[PATCH] knfsd: SUNRPC: Cache remote peer's address in svc_sock The remote peer's address won't change after the socket has been accepted. We don't need to call ->getname on every incoming request. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:35 -08:00
NeilBrown	e79eff1f90	[PATCH] knfsd: SUNRPC: aplit svc_sock_enqueue out of svc_setup_socket Rather than calling svc_sock_enqueue at the end of svc_setup_socket, we now call it (via svc_sock_recieved) after calling svc_setup_socket at each call site. We do this because a subsequent patch will insert some code between the two calls at one call site. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:35 -08:00
Chuck Lever	482fb94e1b	[PATCH] knfsd: SUNRPC: allow creating an RPC service without registering with portmapper Sometimes we need to create an RPC service but not register it with the local portmapper. NFSv4 delegation callback, for example. Change the svc_makesock() API to allow optionally creating temporary or permanent sockets, optionally registering with the local portmapper, and make it return the ephemeral port of the new socket. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:35 -08:00
Chuck Lever	6b174337e5	[PATCH] knfsd: SUNRPC: update internal API: separate pmap register and temp sockets Currently in the RPC server, registering with the local portmapper and creating "permanent" sockets are tied together. Expand the internal APIs to allow these two socket characteristics to be separately specified. This will be externalized in the next patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:35 -08:00
YOSHIFUJI Hideaki	cca5172a7e	[NET] SUNRPC: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:20:13 -08:00
NeilBrown	aaf68cfbf2	[PATCH] knfsd: fix a race in closing NFSd connections If you lose this race, it can iput a socket inode twice and you get a BUG in fs/inode.c When I added the option for user-space to close a socket, I added some cruft to svc_delete_socket so that I could call that function when closing a socket per user-space request. This was the wrong thing to do. I should have just set SK_CLOSE and let normal mechanisms do the work. Not only wrong, but buggy. The locking is all wrong and it openned up a race where-by a socket could be closed twice. So this patch: Introduces svc_close_socket which sets SK_CLOSE then either leave the close up to a thread, or calls svc_delete_socket if it can get SK_BUSY. Adds a bias to sk_busy which is removed when SK_DEAD is set, This avoid races around shutting down the socket. Changes several 'spin_lock' to 'spin_lock_bh' where the _bh was missing. Bugzilla-url: http://bugzilla.kernel.org/show_bug.cgi?id=7916 Signed-off-by: Neil Brown <neilb@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-09 09:25:47 -08:00
NeilBrown	34e9a63b4f	[PATCH] knfsd: ratelimit some nfsd messages that are triggered by external events Also remove {NFSD,RPC}_PARANOIA as having the defines doesn't really add anything. The printks covered by RPC_PARANOIA were triggered by badly formatted packets and so should be ratelimited. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-30 08:26:45 -08:00
NeilBrown	250f391518	[PATCH] knfsd: fix an NFSD bug with full sized, non-page-aligned reads NFSd assumes that largest number of pages that will be needed for a request+response is 2+N where N pages is the size of the largest permitted read/write request. The '2' are 1 for the non-data part of the request, and 1 for the non-data part of the reply. However, when a read request is not page-aligned, and we choose to use ->sendfile to send it directly from the page cache, we may need N+1 pages to hold the whole reply. This can overflow and array and cause an Oops. This patch increases size of the array for holding pages by one and makes sure that entry is NULL when it is not in use. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-26 13:50:59 -08:00
Peter Zijlstra	ed07536ed6	[PATCH] lockdep: annotate nfs/nfsd in-kernel sockets Stick NFS sockets in their own class to avoid some lockdep warnings. NFS sockets are never exposed to user-space, and will hence not trigger certain code paths that would otherwise pose deadlock scenarios. [akpm@osdl.org: cleanups] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Dickson <SteveD@redhat.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Acked-by: Neil Brown <neilb@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> [ Fixed patch corruption by quilt, pointed out by Peter Zijlstra ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:30 -08:00
Nigel Cunningham	7dfb71030f	[PATCH] Add include/linux/freezer.h and move definitions from sched.h Move process freezing functions from include/linux/sched.h to freezer.h, so that modifications to the freezer or the kernel configuration don't require recompiling just about everything. [akpm@osdl.org: fix ueagle driver] Signed-off-by: Nigel Cunningham <nigel@suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:27 -08:00
Andrew Morton	202dd45024	[PATCH] fix "sunrpc: fix refcounting problems in rpc servers" - printk should remain dprintk - fix coding-style. Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-30 12:12:21 -08:00
Neil Brown	d6740df98e	[PATCH] sunrpc: fix refcounting problems in rpc servers A recent patch fixed a problem which would occur when the refcount on an auth_domain reached zero. This problem has not been reported in practice despite existing in two major kernel releases because the refcount can never reach zero. This patch fixes the problems that stop the refcount reaching zero. 1/ We were adding to the refcount when inserting in the hash table, but only removing from the hashtable when the refcount reached zero. Obviously it never would. So don't count the implied reference of being in the hash table. 2/ There are two paths on which a socket can be destroyed. One called svcauth_unix_info_release(). The other didn't. So when the other was taken, we can lose a reference to an ip_map which in-turn holds a reference to an auth_domain So unify the exit paths into svc_sock_put. This highlights the fact that svc_delete_socket has slightly odd semantics - it does not drop a reference but probably should. Fixing this need a bit more thought and testing. Signed-off-by: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-30 12:08:42 -08:00
NeilBrown	1a047060a9	[PATCH] knfsd: fix race that can disable NFS server This patch is suitable for just about any 2.6 kernel. It should go in 2.6.19 and 2.6.18.2 and possible even the .17 and .16 stable series. This is a long standing bug that seems to have only recently become apparent, presumably due to increasing use of NFS over TCP - many distros seem to be making it the default. The SK_CONN bit gets set when a listening socket may be ready for an accept, just as SK_DATA is set when data may be available. It is entirely possible for svc_tcp_accept to be called with neither of these set. It doesn't happen often but there is a small race in svc_sock_enqueue as SK_CONN and SK_DATA are tested outside the spin_lock. They could be cleared immediately after the test and before the lock is gained. This normally shouldn't be a problem. The sockets are non-blocking so trying to read() or accept() when ther is nothing to do is not a problem. However: svc_tcp_recvfrom makes the decision "Should I accept() or should I read()" based on whether SK_CONN is set or not. This usually works but is not safe. The decision should be based on whether it is a TCP_LISTEN socket or a TCP_CONNECTED socket. Signed-off-by: Neil Brown <neilb@suse.de> Cc: Adrian Bunk <bunk@stusta.de> Cc: <stable@kernel.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:44 -07:00
NeilBrown	c6b0a9f87b	[PATCH] knfsd: tidy up up meaning of 'buffer size' in nfsd/sunrpc There is some confusion about the meaning of 'bufsz' for a sunrpc server. In some cases it is the largest message that can be sent or received. In other cases it is the largest 'payload' that can be included in a NFS message. In either case, it is not possible for both the request and the reply to be this large. One of the request or reply may only be one page long, which fits nicely with NFS. So we remove 'bufsz' and replace it with two numbers: 'max_payload' and 'max_mesg'. Max_payload is the size that the server requests. It is used by the server to check the max size allowed on a particular connection: depending on the protocol a lower limit might be used. max_mesg is the largest single message that can be sent or received. It is calculated as the max_payload, rounded up to a multiple of PAGE_SIZE, and with PAGE_SIZE added to overhead. Only one of the request and reply may be this size. The other must be at most one page. Cc: Greg Banks <gnb@sgi.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-06 08:53:41 -07:00
Greg Banks	7b2b1fee30	[PATCH] knfsd: knfsd: cache ipmap per TCP socket Speed up high call-rate workloads by caching the struct ip_map for the peer on the connected struct svc_sock instead of looking it up in the ip_map cache hashtable on every call. This helps workloads using AUTH_SYS authentication over TCP. Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16 synthetic client threads simulating an rsync (i.e. recursive directory listing) workload reading from an i386 RH9 install image (161480 regular files in 10841 directories) on the server. That tree is small enough to fill in the server's RAM so no disk traffic was involved. This setup gives a sustained call rate in excess of 60000 calls/sec before being CPU-bound on the server. Profiling showed strcmp(), called from ip_map_match(), was taking 4.8% of each CPU, and ip_map_lookup() was taking 2.9%. This patch drops both contribution into the profile noise. Note that the above result overstates this value of this patch for most workloads. The synthetic clients are all using separate IP addresses, so there are 64 entries in the ip_map cache hash. Because the kernel measured contained the bug fixed in commit commit `1f1e030bf7` and was running on 64bit little-endian machine, probably all of those 64 entries were on a single chain, thus increasing the cost of ip_map_lookup(). With a modern kernel you would need more clients to see the same amount of performance improvement. This patch has helped to scale knfsd to handle a deployment with 2000 NFS clients. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:16 -07:00
NeilBrown	3cc03b164c	[PATCH] knfsd: Avoid excess stack usage in svc_tcp_recvfrom .. by allocating the array of 'kvec' in 'struct svc_rqst'. As we plan to increase RPCSVC_MAXPAGES from 8 upto 256, we can no longer allocate an array of this size on the stack. So we allocate it in 'struct svc_rqst'. However svc_rqst contains (indirectly) an array of the same type and size (actually several, but they are in a union). So rather than waste space, we move those arrays out of the separately allocated union and into svc_rqst to share with the kvec moved out of svc_tcp_recvfrom (various arrays are used at different times, so there is no conflict). Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:15 -07:00
NeilBrown	4452435948	[PATCH] knfsd: Replace two page lists in struct svc_rqst with one We are planning to increase RPCSVC_MAXPAGES from about 8 to about 256. This means we need to be a bit careful about arrays of size RPCSVC_MAXPAGES. struct svc_rqst contains two such arrays. However the there are never more that RPCSVC_MAXPAGES pages in the two arrays together, so only one array is needed. The two arrays are for the pages holding the request, and the pages holding the reply. Instead of two arrays, we can simply keep an index into where the first reply page is. This patch also removes a number of small inline functions that probably server to obscure what is going on rather than clarify it, and opencode the needed functionality. Also remove the 'rq_restailpage' variable as it is always 0. i.e. if the response 'xdr' structure has a non-empty tail it is always in the same pages as the head. check counters are initilised and incr properly check for consistant usage of ++ etc maybe extra some inlines for common approach general review Signed-off-by: Neil Brown <neilb@suse.de> Cc: Magnus Maatta <novell@kiruna.se> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:15 -07:00
NeilBrown	5680c44632	[PATCH] knfsd: Fixed handling of lockd fail when adding nfsd socket Arrgg.. We cannot 'lockd_up' before 'svc_addsock' as we don't know the protocol yet.... So switch it around again and save the name of the created sockets so that it can be closed if lock_up fails. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:15 -07:00
NeilBrown	37a034729a	[PATCH] knfsd: call lockd_down when closing a socket via a write to nfsd/portlist The refcount that nfsd holds on lockd is based on the number of open sockets. So when we close a socket, we should decrement the ref (with lockd_down). Currently when a socket is closed via writing to the portlist file, that doesn't happen. So: make sure we get an error return if the socket that was requested does is not found, and call lockd_down if it was. Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:15 -07:00
Greg Banks	bfd241600a	[PATCH] knfsd: make rpc threads pools numa aware Actually implement multiple pools. On NUMA machines, allocate a svc_pool per NUMA node; on SMP a svc_pool per CPU; otherwise a single global pool. Enqueue sockets on the svc_pool corresponding to the CPU on which the socket bh is run (i.e. the NIC interrupt CPU). Threads have their cpu mask set to limit them to the CPUs in the svc_pool that owns them. This is the patch that allows an Altix to scale NFS traffic linearly beyond 4 CPUs and 4 NICs. Incorporates changes and feedback from Neil Brown, Trond Myklebust, and Christoph Hellwig. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:20 -07:00
Greg Banks	3262c816a3	[PATCH] knfsd: split svc_serv into pools Split out the list of idle threads and pending sockets from svc_serv into a new svc_pool structure, and allocate a fixed number (in this patch, 1) of pools per svc_serv. The new structure contains a lock which takes over several of the duties of svc_serv->sv_lock, which is now relegated to protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv. The point is to move the hottest fields out of svc_serv and into svc_pool, allowing a following patch to arrange for a svc_pool per NUMA node or per CPU. This is a major step towards making the NFS server NUMA-friendly. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:19 -07:00
Greg Banks	c081a0c7cf	[PATCH] knfsd: test and set SK_BUSY atomically The SK_BUSY bit in svc_sock->sk_flags ensures that we do not attempt to enqueue a socket twice. Currently, setting and clearing the bit is protected by svc_serv->sv_lock. As I intend to reduce the data that the lock protects so it's not held when svc_sock_enqueue() tests and sets SK_BUSY, that test and set needs to be atomic. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:19 -07:00
Greg Banks	5685f0fa1c	[PATCH] knfsd: convert sk_reserved to atomic_t Convert the svc_sock->sk_reserved variable from an int protected by svc_serv->sv_lock, to an atomic. This reduces (by 1) the number of places we need to take the (effectively global) svc_serv->sv_lock. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:19 -07:00
Greg Banks	1a68d952af	[PATCH] knfsd: use new lock for svc_sock deferred list Protect the svc_sock->sk_deferred list with a new lock svc_sock->sk_defer_lock instead of svc_serv->sv_lock. Using the more fine-grained lock reduces the number of places we need to take the svc_serv lock. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:19 -07:00
Greg Banks	c45c357d7d	[PATCH] knfsd: convert sk_inuse to atomic_t Convert the svc_sock->sk_inuse counter from an int protected by svc_serv->sv_lock, to an atomic. This reduces the number of places we need to take the (effectively global) svc_serv->sv_lock. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:19 -07:00
Greg Banks	36bdfc8bae	[PATCH] knfsd: move tempsock aging to a timer Following are 11 patches from Greg Banks which combine to make knfsd more Numa-aware. They reduce hitting on 'global' data structures, and create some data-structures that can be node-local. knfsd threads are bound to a particular node, and the thread to handle a new request is chosen from the threads that are attach to the node that received the interrupt. The distribution of threads across nodes can be controlled by a new file in the 'nfsd' filesystem, though the default approach of an even spread is probably fine for most sites. Some (old) numbers that show the efficacy of these patches: N == number of NICs == number of CPUs == nmber of clients. Number of NUMA nodes == N/2 N Throughput, MiB/s CPU usage, % (max=N*100) Before After Before After --- ------ ---- ----- ----- 4 312 435 350 228 6 500 656 501 418 8 562 804 690 589 This patch: Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to a timer which uses a mark-and-sweep algorithm every 6 minutes. This reduces the amount of work that needs to be done in the main RPC loop and the length of time we need to hold the (effectively global) svc_serv->sv_lock. [akpm@osdl.org: cleanup] Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:19 -07:00
NeilBrown	6fb2b47fa1	[PATCH] knfsd: Drop 'serv' option to svc_recv and svc_process It isn't needed as it is available in rqstp->rq_server, and dropping it allows some local vars to be dropped. [akpm@osdl.org: build fix] Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:18 -07:00
NeilBrown	b41b66d63c	[PATCH] knfsd: allow sockets to be passed to nfsd via 'portlist' Userspace should create and bind a socket (but not connectted) and write the 'fd' to portlist. This will cause the nfs server to listen on that socket. To close a socket, the name of the socket - as read from 'portlist' can be written to 'portlist' with a preceding '-'. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:18 -07:00
NeilBrown	80212d59e3	[PATCH] knfsd: define new nfsdfs file: portlist - contains list of ports This file will list all ports that nfsd has open. Default when TCP enabled will be ipv4 udp 0.0.0.0 2049 ipv4 tcp 0.0.0.0 2049 Later, the list of ports will be settable. 'portlist' chosen rather than 'ports', to avoid unnecessary confusion with non-mainline patches which created 'ports' with different semantics. [akpm@osdl.org: cleanups, build fix] Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:18 -07:00
Eric Sesterhenn	1811474620	[SUNRPC]: Remove unnecessary check in net/sunrpc/svcsock.c coverity spotted this one as possible dereference in the dprintk(), but since there is only one caller of svc_create_socket(), which always passes a valid sin, we dont need this check. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:03:06 -07:00
Alexey Dobriyan	d8ed029d60	[SUNRPC]: trivial endianness annotations pure s/u32/__be32/ [AV: large part based on Alexey's patches] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:01:21 -07:00
Sridhar Samudrala	e6242e928e	[SUNRPC]: Update to use in-kernel sockets API. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:54:06 -07:00
Panagiotis Issaris	0da974f4f3	[NET]: Conversions from kmalloc+memset to k(z\|c)alloc. Signed-off-by: Panagiotis Issaris <takis@issaris.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-21 14:51:30 -07:00
Ingo Molnar	57b47a53ec	[NET]: sem2mutex part 2 Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:35:41 -08:00
J. Bruce Fields	1918e34138	[PATCH] svcrpc: save and restore the daddr field when request deferred The server code currently keeps track of the destination address on every request so that it can reply using the same address. However we forget to do that in the case of a deferred request. Remedy this oversight. >From folks at PolyServe. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-18 19:20:24 -08:00
Olaf Kirch	93fbf1a5de	[PATCH] Keep nfsd from exiting when seeing recv() errors I submitted this one previously - svc_tcp_recvfrom currently returns any errors to the caller, including ECONNRESET and the like. This is something svc_recv isn't able to deal with: len = svsk->sk_recvfrom(rqstp); [...] if (len == 0 \|\| len == -EAGAIN) { [...] return -EAGAIN; } [...] return len; The nfsd main loop will exit when it sees an error code other than EAGAIN. The following patch fixes this problem svc_recv is not equipped to deal with error codes other than EAGAIN, and will propagate anything else (such as ECONNRESET) up to nfsd, causing it to exit. Signed-off-by: Olaf Kirch <okir@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-06 08:33:59 -08:00
Eric Dumazet	90ddc4f047	[NET]: move struct proto_ops to const I noticed that some of 'struct proto_ops' used in the kernel may share a cache line used by locks or other heavily modified data. (default linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at least) This patch makes sure a 'struct proto_ops' can be declared as const, so that all cpus can share all parts of it without false sharing. This is not mandatory : a driver can still use a read/write structure if it needs to (and eventually a __read_mostly) I made a global stubstitute to change all existing occurences to make them const. This should reduce the possibility of false sharing on SMP, and speedup some socket system calls. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:15 -08:00
NeilBrown	1887b93529	[PATCH] knfsd: make sure nfsd doesn't hog a cpu forever Being kernel-threads, nfsd servers don't get pre-empted (depending on CONFIG). If there is a steady stream of NFS requests that can be served from cache, an nfsd thread may hold on to a cpu indefinitely, which isn't very friendly. So it is good to have a cond_resched in there (just before looking for a new request to serve), to make sure we play nice. Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-11-15 08:59:19 -08:00
Herbert Xu	fb286bb299	[NET]: Detect hardware rx checksum faults correctly Here is the patch that introduces the generic skb_checksum_complete which also checks for hardware RX checksum faults. If that happens, it'll call netdev_rx_csum_fault which currently prints out a stack trace with the device name. In future it can turn off RX checksum. I've converted every spot under net/ that does RX checksum checks to use skb_checksum_complete or __skb_checksum_complete with the exceptions of: * Those places where checksums are done bit by bit. These will call netdev_rx_csum_fault directly. * The following have not been completely checked/converted: ipmr ip_vs netfilter dccp This patch is based on patches and suggestions from Stephen Hemminger and David S. Miller. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 13:01:24 -08:00
Trond Myklebust	4c2cb58c55	Merge /home/trondmy/scm/kernel/git/torvalds/linux-2.6	2005-10-27 19:12:49 -04:00
Andrew Morton	4bcde03d41	[PATCH] svcsock timestamp fix Convert nanoseconds to microseconds correctly. Spotted by Steve Dickson <SteveD@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-10-26 10:39:43 -07:00
Chuck Lever	094bb20b9f	[PATCH] RPC: extract socket logic common to both client and server Clean-up: Move some code that is common to both RPC client- and server-side socket transports into its own source file, net/sunrpc/socklib.c. Test-plan: Compile kernel with CONFIG_NFS enabled. Millions of fsx operations over UDP, client and server. Connectathon over UDP. Version: Thu, 11 Aug 2005 16:03:09 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-09-23 12:38:11 -04:00
Neil Brown	939bb7ef90	[PATCH] Code cleanups in calbacks in svcsock Change a printk(KERN_WARNING to dprintk, and it is really only interesting when trying to debug a problem, and can occur normally without error. Remove various gratuitous gotos in surrounding code, and remove some type-cast assignments from inside 'if' conditionals, as that is just obscuring what it going on. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-09-13 08:22:32 -07:00
Nishanth Aravamudan	121caf577d	[NET]: fix-up schedule_timeout() usage Use schedule_timeout_{,un}interruptible() instead of set_current_state()/schedule_timeout() to reduce kernel size. Also use human-time conversion functions instead of hard-coded division to avoid rounding issues. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-12 14:15:34 -07:00

1 2

56 Commits