* remove asm/atomic.h inclusion from linux/utsname.h --
not needed after kref conversion
* remove linux/utsname.h inclusion from files which do not need it
NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
due to some personality stuff it _is_ needed -- cowardly leave ELF-related
headers and files alone.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[[resending with correct cc: - "vfs.kernel.org" just isn't right!]]
xprt->reestablish_timeout is used to cause TCP connection attempts to
back off if the connection fails so as not to hammer the network,
but to still allow immediate connections when there is no reason to
believe there is a problem.
It is not used for the first connection (when transport->sock is NULL)
but only on reconnects.
It is currently set:
a/ to 0 when xs_tcp_state_change finds a state of TCP_FIN_WAIT1
on the assumption that the client has closed the connection
so the reconnect should be immediate when needed.
b/ to at least XS_TCP_INIT_REEST_TO when xs_tcp_state_change
detects TCP_CLOSING or TCP_CLOSE_WAIT on the assumption that the
server closed the connection so a small delay at least is
required.
c/ as above when xs_tcp_state_change detects TCP_SYN_SENT, so that
it is never 0 while a connection has been attempted, else
the doubling will produce 0 and there will be no backoff.
d/ to double is value (up to a limit) when delaying a connection,
thus providing exponential backoff and
e/ to XS_TCP_INIT_REEST_TO in xs_setup_tcp as simple initialisation.
So you can see it is highly dependant on xs_tcp_state_change being
called as expected. However experimental evidence shows that
xs_tcp_state_change does not see all state changes.
("rpcdebug -m rpc trans" can help show what actually happens).
Results show:
TCP_ESTABLISHED is reported when a connection is made. TCP_SYN_SENT
is never reported, so rule 'c' above is never effective.
When the server closes the connection, TCP_CLOSE_WAIT and
TCP_LAST_ACK *might* be reported, and TCP_CLOSE is always
reported. This rule 'b' above will sometimes be effective, but
not reliably.
When the client closes the connection, it used to result in
TCP_FIN_WAIT1, TCP_FIN_WAIT2, TCP_CLOSE. However since commit
f75e674 (SUNRPC: Fix the problem of EADDRNOTAVAIL syslog floods on
reconnect) we don't see *any* events on client-close. I think this
is because xs_restore_old_callbacks is called to disconnect
xs_tcp_state_change before the socket is closed.
In any case, rule 'a' no longer applies.
So all that is left are rule d, which successfully doubles the
timeout which is never rest, and rule e which initialises the timeout.
Even if the rules worked as expected, there would be a problem because
a successful connection does not reset the timeout, so a sequence
of events where the server closes the connection (e.g. during failover
testing) will cause longer and longer timeouts with no good reason.
This patch:
- sets reestablish_timeout to 0 in xs_close thus effecting rule 'a'
- sets it to 0 in xs_tcp_data_ready to ensure that a successful
connection resets the timeout
- sets it to at least XS_TCP_INIT_REEST_TO after it is doubled,
thus effecting rule c
I have not reimplemented rule b and the new version of rule c
seems sufficient.
I suspect other code in xs_tcp_data_ready needs to be revised as well.
For example I don't think connect_cookie is being incremented as often
as it should be.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* 'for-2.6.32' of git://linux-nfs.org/~bfields/linux: (68 commits)
nfsd4: nfsv4 clients should cross mountpoints
nfsd: revise 4.1 status documentation
sunrpc/cache: avoid variable over-loading in cache_defer_req
sunrpc/cache: use list_del_init for the list_head entries in cache_deferred_req
nfsd: return success for non-NFS4 nfs4_state_start
nfsd41: Refactor create_client()
nfsd41: modify nfsd4.1 backchannel to use new xprt class
nfsd41: Backchannel: Implement cb_recall over NFSv4.1
nfsd41: Backchannel: cb_sequence callback
nfsd41: Backchannel: Setup sequence information
nfsd41: Backchannel: Server backchannel RPC wait queue
nfsd41: Backchannel: Add sequence arguments to callback RPC arguments
nfsd41: Backchannel: callback infrastructure
nfsd4: use common rpc_cred for all callbacks
nfsd4: allow nfs4 state startup to fail
SUNRPC: Defer the auth_gss upcall when the RPC call is asynchronous
nfsd4: fix null dereference creating nfsv4 callback client
nfsd4: fix whitespace in NFSPROC4_CLNT_CB_NULL definition
nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel
sunrpc/cache: simplify cache_fresh_locked and cache_fresh_unlocked.
...
In cache_defer_req, 'dreq' is used for two significantly different
values that happen to be of the same type.
This is both confusing, and makes it hard to extend the range of one of
the values as we will in the next patch.
So introduce 'discard' to take one of the values.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Using list_del_init is generally safer than list_del, and it will
allow us, in a subsequent patch, to see if an entry has already been
processed or not.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Otherwise, the upcall is going to be synchronous, which may not be what the
caller wants...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
[sunrpc: change idle timeout value for the backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The extra call to cache_revisit_request in cache_fresh_unlocked is not
needed, as should have been fairly clear at the time of
commit 4013edea9a
If there are requests to be revisited, then we can be sure that
CACHE_PENDING is set, so the second call is sufficient.
So remove the first call.
Then remove the 'new' parameter,
then remove the return value for cache_fresh_locked which is only used
to provide the value for 'new'.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
As "cache_defer_req" does not sound like a predicate, having it return
a boolean value can be confusing. It is more consistent to return
0 for success and negative for error.
Exactly what error code to return is not important as we don't
differentiate between reasons why the request wasn't deferred,
we only care about whether it was deferred or not.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
When the call direction is a reply, copy the xid and call direction into the
req->rq_private_buf.head[0].iov_base otherwise rpc_verify_header returns
rpc_garbage.
Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[get rid of CONFIG_NFSD_V4_1]
[sunrpc: refactoring of svc_tcp_recvfrom]
[nfsd41: sunrpc: create common send routine for the fore and the back channels]
[nfsd41: sunrpc: Use free_page() to free server backchannel pages]
[nfsd41: sunrpc: Document server backchannel locking]
[nfsd41: sunrpc: remove bc_connect_worker()]
[nfsd41: sunrpc: Define xprt_server_backchannel()[
[nfsd41: sunrpc: remove bc_close and bc_init_auto_disconnect dummy functions]
[nfsd41: sunrpc: eliminate unneeded switch statement in xs_setup_tcp()]
[nfsd41: sunrpc: Don't auto close the server backchannel connection]
[nfsd41: sunrpc: Remove unused functions]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: change bc_sock to bc_xprt]
[nfsd41: sunrpc: move struct rpc_buffer def into a common header file]
[nfsd41: sunrpc: use rpc_sleep in bc_send_request so not to block on mutex]
[removed cosmetic changes]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[sunrpc: add new xprt class for nfsv4.1 backchannel]
[sunrpc: v2.1 change handling of auto_close and init_auto_disconnect operations for the nfsv4.1 backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
[reverted more cosmetic leftovers]
[got rid of xprt_server_backchannel]
[separated "nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Cc: Trond Myklebust <trond.myklebust@netapp.com>
[sunrpc: change idle timeout value for the backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Acked-by: Trond Myklebust <trond.myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move struct rpc_buffer's definition into a sunrpc.h, a common, internal
header file, in preparation for supporting the nfsv4.1 backchannel.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: #include <linux/net.h> from sunrpc.h]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch fixes the bug that was reported in
http://bugzilla.kernel.org/show_bug.cgi?id=14053
If we're in the case where we need to force a reencode and then resend of
the RPC request, due to xprt_transmit failing with a networking error, then
we _must_ retransmit the entire request.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sunrpc: "Move close processing to a single place"
(d7979ae4a0) moved the
close processing before the recvfrom method. This may
cause the close processing never to execute. So this
patch moves it to the right place.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
When an RPC message is received with RPCSEC_GSS with an unknown service
(not RPC_GSS_SVC_NONE, RPC_GSS_SVC_INTEGRITY, or RPC_GSS_SVC_PRIVACY),
svcauth_gss_accept() returns AUTH_BADCRED, but svcauth_gss_release()
subsequently drops the response entirely, discarding the error.
Fix that so the AUTH_BADCRED error is returned to the client.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
lock_kernel() in knfsd was replaced with a mutex. The later
commit 03cf6c9f49 ("knfsd:
add file to export stats about nfsd pools") did not follow
that change. This patch fixes the issue.
Also move the get and put of nfsd_serv to the open and close methods
(instead of start and stop methods) to allow atomic check and increment
of reference count in the open method (where we can still return an
error).
Signed-off-by: Ryusei Yamaguchi <mandel59@gmail.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: Greg Banks <gnb@fmeh.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Factor functionality out of svc_tcp_recvfrom() to simplify routine
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We can oops if rpc_pipefs isn't properly initialised before we start to set
up objects that depend upon it.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Otherwise we Oops if the module containing the cache detail is removed
before all cache readers have closed the file.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The NFSv4 and NFSv4.1 protocols both allow for the redirection of a client
from one server to another in order to support filesystem migration and
replication. For full protocol support, we need to add the ability to
convert a DNS host name into an IP address that we can feed to the RPC
client.
We'll reuse the sunrpc cache, now that it has been converted to work with
rpc_pipefs.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ntohl is already defined as be32_to_cpu.
be64_to_cpu has architecture specific optimized implementations.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
htonl is already defined as cpu_to_be32.
cpu_to_be64 has architecture specific optimized implementations.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
For events that are rare, such as referral DNS lookups, it makes limited
sense to have a daemon constantly listening for upcalls on a channel. An
alternative in those cases might simply be to run the app that fills the
cache using call_usermodehelper_exec() and friends.
The following patch allows the cache_detail to specify alternative upcall
mechanisms for these particular cases.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
While we do want to protect against multiple concurrent readers and writers
on each upcall/downcall pipe, we don't want to limit concurrent reading and
writing to separate caches.
This patch therefore replaces the static buffer 'write_buf', which can only
be used by one writer at a time, with use of the page cache as the
temporary buffer for downcalls. We still fall back to using the the old
global buffer if the downcall is larger than PAGE_CACHE_SIZE, since this is
apparently needed by the SPKM security context initialisation.
It then replaces the use of the global 'queue_io_mutex' with the
inode->i_mutex in cache_read() and cache_write().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Also ensure that we destroy those files before we destroy the cache_detail.
Otherwise, user processes might attempt to write into uninitialised caches.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In order to allow rpc_pipefs to create directories with different types of
subtrees, it is useful to allow the caller to customise the subtree filling
process.
In order to do so, we separate out the parts which are specific to making
an RPC client directory, and put them in a separate helper, then we convert
the process of filling the directory contents into a callback.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
There is still a little wart or two there: Since we've already got a
vfsmount, we might as well pass that in to rpc_create_client_dir.
Another point is that if we open code __rpc_lookup_path() here, then we can
avoid looking up the entire parent directory path over and over again: it
doesn't change.
Also get rid of rpc_clnt->cl_pathname, since it has no users...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This reflects the fact that rpc_mkdir() as it stands today, can only create
a RPC client type directory.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: provide documenting comments for the functions in
net/sunrpc/timer.c.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
After a bind completes, update the transport instance's address
strings so debugging messages display the current port the transport
is connected to.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
At some point, I recall that rpc_pipe_fs used RPC_DISPLAY_ALL.
Currently there are no uses of RPC_DISPLAY_ALL outside the transport
modules themselves, so we can safely get rid of it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Give the "addr" and "port" field less ambiguous names.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Replace PROC macro with open coded C99 structure
initializers to improve readability.
The rpcbind v4 GETVERSADDR procedure is never sent by the current
implementation, so it is not copied to the new structures.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace the open-coded decode logic for PMAP_GETPORT/RPCB_GETADDR with
an xdr_stream-based implementation, similar to what NFSv4 uses, to
protect against buffer overflows. The new implementation also checks
that the incoming port number is reasonable.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace the open-coded decode logic for rpcbind UNSET results with an
xdr_stream-based implementation, similar to what NFSv4 uses, to
protect against buffer overflows.
The new function is unused for the moment.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace the open-coded encode logic for rpcbind arguments with an
xdr_stream-based implementation, similar to what NFSv4 uses, to
better protect against buffer overflows.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: In addition to using the new generic rpc_ntop() and
rpc_get_port() functions, have the RPC client compute the presentation
address buffer sizes dynamically using kstrdup().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
RPC universal address generation is currently done in several places:
rpcb_clnt.c, nfs4proc.c xprtsock.c, and xprtrdma.c. Remove the
redundant cases that convert a socket address to a universal
address. The nfs4proc.c case takes a pre-formatted presentation
address string, not a socket address, so we'll leave that one.
Because the new uaddr constructor uses the recently introduced
rpc_ntop(), it now supports proper "::" shorthanding for IPv6
addresses. This allows the kernel to register properly formed
universal addresses with the local rpcbind service, in _all_ cases.
The kernel can now also send properly formed universal addresses in
RPCB_GETADDR requests, and support link-local properly when
encoding and decoding IPv6 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Introduce a set of functions in the kernel's RPC implementation for
converting between a socket address and either a standard
presentation address string or an RPC universal address.
The universal address functions will be used to encode and decode
RPCB_FOO and NFSv4 SETCLIENTID arguments. The other functions are
part of a previous promise to deliver shared functions that can be
used by upper-layer protocols to display and manipulate IP
addresses.
The kernel's current address printf formatters were designed
specifically for kernel to user-space APIs that require a particular
string format for socket addresses, thus are somewhat limited for the
purposes of sunrpc.ko. The formatter for IPv6 addresses, %pI6, does
not support short-handing or scope IDs. Also, these printf formatters
are unique per address family, so a separate formatter string is
required for printing AF_INET and AF_INET6 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: To make subsequent patches cleaner, move the XDR data type
size macros to the top of the file (similar to nfs4xdr.c) first.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Parameters like the minimum reserved port, and the number of slot entries
should really be module parameters rather than sysctls.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When we look up an entry in the uid->gidlist cache, we take
a reference to the content but don't drop the reference to the
cache entry. So it never gets freed.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
If cache_defer_req did not leave the request on a queue, then it could
possibly have waited long enough that the cache became valid. So check the
status after the call.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
While deferred requests normally get revisited quite quickly,
it is possible for a request to remain in the deferral queue
when the cache item is discarded. We can easily make sure that
doesn't happen by calling cache_revisit_request just before
the final 'put'.
Also there is a small chance that a race would cause one thread to
defer a request against a cache item while another thread is failing
to queue an upcall for that item. So when the upcall fails, make
sure to revisit all deferred requests.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
'loose' was a mis-spelling of 'lose', and even that wasn't a good
word choice.
So give this function a more useful name.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
PKTINFO is needed to scrape the caller's IP address off the socket so
RPC datagram replies are routed correctly. Fill in missing pieces in
the kernel RPC server's UDP receive path to request IPv6 PKTINFO and
correctly parse the IPv6 cmsg header.
Without this patch, kernel RPC services drop all incoming requests on
UDP on IPv6.
Related commit: 7a37f5787e
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT
This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ib_alloc_fast_reg_mr() and ib_alloc_fast_reg_page_list() returns
ERR_PTR() and not NULL. Compile tested only.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The sunrpc module uses rcu_call() thus it should use rcu_barrier() on
module unload.
Have not verified that the possibility for new call_rcu() callbacks
has been disabled. As a hint for checking, the functions calling
call_rcu() (unx_destroy_cred and generic_destroy_cred) are
registered as crdestroy function pointer in struct rpc_credops.
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits)
SUNRPC: Fix the TCP server's send buffer accounting
nfsd41: Backchannel: minorversion support for the back channel
nfsd41: Backchannel: cleanup nfs4.0 callback encode routines
nfsd41: Remove ip address collision detection case
nfsd: optimise the starting of zero threads when none are running.
nfsd: don't take nfsd_mutex twice when setting number of threads.
nfsd41: sanity check client drc maxreqs
nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct
NFS: kill off complicated macro 'PROC'
sunrpc: potential memory leak in function rdma_read_xdr
nfsd: minor nfsd_vfs_write cleanup
nfsd: Pull write-gathering code out of nfsd_vfs_write
nfsd: track last inode only in use_wgather case
sunrpc: align cache_clean work's timer
nfsd: Use write gathering only with NFSv2
NFSv4: kill off complicated macro 'PROC'
NFSv4: do exact check about attribute specified
knfsd: remove unreported filehandle stats counters
knfsd: fix reply cache memory corruption
knfsd: reply cache cleanups
...
xprt_alloc_bc_request() is always called in soft interrupt context.
Grab the spin_lock instead of the bottom half spin_lock. Softirqs
do not preempt other softirqs running on the same processor, so there
is no need to disable bottom halves.
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, the sunrpc server is refusing to allow us to process new RPC
calls if the TCP send buffer is 2/3 full, even if we do actually have
enough free space to guarantee that we can send another request.
The following patch fixes svc_tcp_has_wspace() so that we only stop
processing requests if we know that the socket buffer cannot possibly fit
another reply.
It also fixes the tcp write_space() callback so that we only clear the
SOCK_NOSPACE flag when the TCP send buffer is less than 2/3 full.
This should ensure that the send window will grow as per the standard TCP
socket code.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The 'rq_received' member of 'struct rpc_rqst' is used to track when we
have received a reply to our request. With v4.1, the backchannel
can now accept callback requests over the existing connection. Rename
this field to make it clear that it is only used for tracking reply bytes
and not all bytes received on the connection.
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Obtain the rpc_xprt from the rpc_rqst so that calls and callback replies
can both use the same code path. A client needs the rpc_xprt in order
to reply to a callback.
Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
This svc_xprt is passed on to the callback service thread to be later used
to processes incoming svc_rqst's
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
For nfs41 callbacks we need an svc_xprt to process requests coming up the
backchannel socket as rpc_rqst's that are transformed into svc_rqst's that
need a rq_xprt to be processed.
The svc_{udp,tcp}_create methods are too heavy for this job as svc_create_socket
creates an actual socket to listen on while for nfs41 we're "reusing" the
fore channel's socket.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Implement the NFSv4.1 backchannel service. Invokes the common callback
processing logic svc_process_common() to authenticate the call and
dispatch the appropriate NFSv4.1 XDR decoder and operation procedure.
It then invokes bc_send() to send the reply over the same connection.
bc_send() is implemented in a separate patch.
At this time there is no slot validation or reply cache handling.
[nfs41: Preallocate rpc_rqst receive buffer for handling callbacks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Move bc_svc_process() declaration to correct patch]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
net/sunrpc/svc.c:svc_process() is used by the NFSv4 callback service
to process RPC requests arriving over connections initiated by the
server. NFSv4.1 supports callbacks over the backchannel on connections
initiated by the client. This patch refactors svc_process() so that
common code can also be used by the backchannel.
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Executes the backchannel task on the RPC state machine using
the existing open connection previously established by the client.
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
nfs41: Add bc_svc.o to sunrpc Makefile.
[nfs41: bc_send() does not need to be exported outside RPC module]
[nfs41: xprt_free_bc_request() need not be exported outside RPC module]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Update copyright]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Adds rpc_run_bc_task() which is called by the NFS callback service to
process backchannel requests. It performs similar work to rpc_run_task()
though "schedules" the backchannel task to be executed starting at the
call_trasmit state in the RPC state machine.
It also introduces some miscellaneous updates to the argument validation,
call_transmit, and transport cleanup functions to take into account
that there are now forechannel and backchannel tasks.
Backchannel requests do not carry an RPC message structure, since the
payload has already been XDR encoded using the existing NFSv4 callback
mechanism.
Introduce a new transmit state for the client to reply on to backchannel
requests. This new state simply reserves the transport and issues the
reply. In case of a connection related error, disconnects the transport and
drops the reply. It requires the forechannel to re-establish the connection
and the server to retransmit the request, as stated in NFSv4.1 section
2.9.2 "Client and Server Transport Behavior".
Note: There is no need to loop attempting to reserve the transport. If EAGAIN
is returned by xprt_prepare_transmit(), return with tk_status == 0,
setting tk_action to call_bc_transmit. rpc_execute() will invoke it again
after the task is taken off the sleep queue.
[nfs41: rpc_run_bc_task() need not be exported outside RPC module]
[nfs41: New call_bc_transmit RPC state]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Backchannel: No need to loop in call_bc_transmit()]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[rpc_count_iostats incorrectly exits early]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Convert rpc_reply_expected() to inline function]
[Remove unnecessary BUG_ON()]
[Rename variable]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
In the case of -EADDRNOTAVAIL and/or unhandled connection errors, we want
to get rid of the existing socket and retry immediately, just as the
comment says. Currently we end up sleeping for a minute, due to the missing
"break" statement.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Handles RPC replies and backchannel callbacks. Traditionally the NFS
client has expected only RPC replies on its open connections. With
NFSv4.1, callbacks can arrive over an existing open connection.
This patch refactors the old xs_tcp_read_request() into an RPC reply handler:
xs_tcp_read_reply(), a new backchannel callback handler: xs_tcp_read_callback(),
and a common routine to read the data off the transport: xs_tcp_read_common().
The new xs_tcp_read_callback() queues callback requests onto a queue where
the callback service (a separate thread) is listening for the processing.
This patch incorporates work and suggestions from Rahul Iyer (iyer@netapp.com)
and Benny Halevy (bhalevy@panasas.com).
xs_tcp_read_callback() drops the connection when the number of expected
callbacks is exceeded. Use xprt_force_disconnect(), ensuring tasks on
the pending queue are awaken on disconnect.
[nfs41: Keep track of RPC call/reply direction with a flag]
[nfs41: Preallocate rpc_rqst receive buffer for handling callbacks]
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: xs_tcp_read_callback() should use xprt_force_disconnect()]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Moves embedded #ifdefs into #ifdef function blocks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
This patch introduces support to setup the callback xprt on the client side.
It allocates/ destroys the preallocated memory structures used to process
backchannel requests.
At setup time, xprt_setup_backchannel() is invoked to allocate one or
more rpc_rqst structures and substructures. This ensures that they
are available when an RPC callback arrives. The rpc_rqst structures
are maintained in a linked list attached to the rpc_xprt structure.
We keep track of the number of allocations so that they can be correctly
removed when the channel is destroyed.
When an RPC callback arrives, xprt_alloc_bc_request() is invoked to
obtain a preallocated rpc_rqst structure. An rpc_xprt structure is
returned, and its RPC_BC_PREALLOC_IN_USE bit is set in
rpc_xprt->bc_flags. The structure is removed from the the list
since it is now in use, and it will be later added back when its
user is done with it.
After the RPC callback replies, the rpc_rqst structure is returned
by invoking xprt_free_bc_request(). This clears the
RPC_BC_PREALLOC_IN_USE bit and adds it back to the list, allowing it
to be reused by a subsequent RPC callback request.
To be consistent with the reception of RPC messages, the backchannel requests
should be placed into the 'struct rpc_rqst' rq_rcv_buf, which is then in turn
copied to the 'struct rpc_rqst' rq_private_buf.
[nfs41: Preallocate rpc_rqst receive buffer for handling callbacks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Update copyright notice and explain page allocation]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Reading and storing the RPC direction is a three step process.
1. xs_tcp_read_calldir() reads the RPC direction, but it will not store it
in the XDR buffer since the 'struct rpc_rqst' is not yet available.
2. The 'struct rpc_rqst' is obtained during the TCP_RCV_COPY_DATA state.
This state need not necessarily be preceeded by the TCP_RCV_READ_CALLDIR.
For example, we may be reading a continuation packet to a large reply.
Therefore, we can't simply obtain the 'struct rpc_rqst' during the
TCP_RCV_READ_CALLDIR state and assume it's available during TCP_RCV_COPY_DATA.
This patch adds a new TCP_RCV_READ_CALLDIR flag to indicate the need to
read the RPC direction. It then uses TCP_RCV_COPY_CALLDIR to indicate the
RPC direction needs to be saved after the 'struct rpc_rqst' has been allocated.
3. The 'struct rpc_rqst' is obtained by the xs_tcp_read_data() helper
functions. xs_tcp_read_common() then saves the RPC direction in the XDR
buffer if TCP_RCV_COPY_CALLDIR is set. This will happen when we're reading
the data immediately after the direction was read. xs_tcp_read_common()
then clears this flag.
[was nfs41: Skip past the RPC call direction]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: Add RPC direction back into the XDR buffer]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: Don't skip past the RPC call direction]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
NFSv4.1 callbacks can arrive over an existing connection. This patch adds
the logic to read the RPC call direction (call or reply). It does this by
updating the state machine to look for the call direction invoking
xs_tcp_read_calldir(...) after reading the XID.
[nfs41: Keep track of RPC call/reply direction with a flag]
As per 11/14/08 review of RFC 53/85.
Add a new flag to track whether the incoming message is an RPC call or an
RPC reply. TCP_RPC_REPLY is set in the 'struct sock_xprt' tcp_flags in
xs_tcp_read_calldir() if the message is an RPC reply sent on the forechannel.
It is cleared if the message is an RPC request sent on the back channel.
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
num_online_nodes() is called in a number of places but most often by the
page allocator when deciding whether the zonelist needs to be filtered
based on cpusets or the zonelist cache. This is actually a heavy function
and touches a number of cache lines.
This patch stores the number of online nodes at boot time and updates the
value when nodes get onlined and offlined. The value is then used in a
number of important paths in place of num_online_nodes().
[rientjes@google.com: do not override definition of node_set_online() with macro]
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In case the check on ch_count fails the cleanup path is skipped and the
previously allocated memory 'rpl_map', 'chl_map' is not freed.
Reported by Coverity.
Signed-off-by: Christian Engelmayer <christian.engelmayer@frequentis.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Align cache_clean work.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
As the module uses rcu_call() we should make sure that all
rcu callback has been completed before removing the code.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define three accessors to get/set dst attached to a skb
struct dst_entry *skb_dst(const struct sk_buff *skb)
void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;
Delete skb->dst field
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux:
svcrdma: dma unmap the correct length for the RPCRDMA header page.
nfsd: Revert "svcrpc: take advantage of tcp autotuning"
nfsd: fix hung up of nfs client while sync write data to nfs server