Pull signal/exit/ptrace updates from Eric Biederman:
"This set of changes deletes some dead code, makes a lot of cleanups
which hopefully make the code easier to follow, and fixes bugs found
along the way.
The end-game which I have not yet reached yet is for fatal signals
that generate coredumps to be short-circuit deliverable from
complete_signal, for force_siginfo_to_task not to require changing
userspace configured signal delivery state, and for the ptrace stops
to always happen in locations where we can guarantee on all
architectures that the all of the registers are saved and available on
the stack.
Removal of profile_task_ext, profile_munmap, and profile_handoff_task
are the big successes for dead code removal this round.
A bunch of small bug fixes are included, as most of the issues
reported were small enough that they would not affect bisection so I
simply added the fixes and did not fold the fixes into the changes
they were fixing.
There was a bug that broke coredumps piped to systemd-coredump. I
dropped the change that caused that bug and replaced it entirely with
something much more restrained. Unfortunately that required some
rebasing.
Some successes after this set of changes: There are few enough calls
to do_exit to audit in a reasonable amount of time. The lifetime of
struct kthread now matches the lifetime of struct task, and the
pointer to struct kthread is no longer stored in set_child_tid. The
flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is
removed. Issues where task->exit_code was examined with
signal->group_exit_code should been examined were fixed.
There are several loosely related changes included because I am
cleaning up and if I don't include them they will probably get lost.
The original postings of these changes can be found at:
https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.orghttps://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.orghttps://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org
I trimmed back the last set of changes to only the obviously correct
once. Simply because there was less time for review than I had hoped"
* 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits)
ptrace/m68k: Stop open coding ptrace_report_syscall
ptrace: Remove unused regs argument from ptrace_report_syscall
ptrace: Remove second setting of PT_SEIZED in ptrace_attach
taskstats: Cleanup the use of task->exit_code
exit: Use the correct exit_code in /proc/<pid>/stat
exit: Fix the exit_code for wait_task_zombie
exit: Coredumps reach do_group_exit
exit: Remove profile_handoff_task
exit: Remove profile_task_exit & profile_munmap
signal: clean up kernel-doc comments
signal: Remove the helper signal_group_exit
signal: Rename group_exit_task group_exec_task
coredump: Stop setting signal->group_exit_task
signal: Remove SIGNAL_GROUP_COREDUMP
signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process
signal: Make coredump handling explicit in complete_signal
signal: Have prepare_signal detect coredumps using signal->core_state
signal: Have the oom killer detect coredumps using signal->core_state
exit: Move force_uaccess back into do_exit
exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit
...
NFSv4.1 supports an optional lock notification feature which notifies
the client when a lock comes available. (Normally NFSv4 clients just
poll for locks if necessary.) To make that work, we need to request a
blocking lock from the filesystem.
We turned that off for NFS in commit f657f8eef3 ("nfs: don't atempt
blocking locks on nfs reexports") [sic] because it actually blocks the
nfsd thread while waiting for the lock.
Thanks to Vasily Averin for pointing out that NFS isn't the only
filesystem with that problem.
Any filesystem that leaves ->lock NULL will use posix_lock_file(), which
does the right thing. Simplest is just to assume that any filesystem
that defines its own ->lock is not safe to request a blocking lock from.
So, this patch mostly reverts commit f657f8eef3 ("nfs: don't atempt
blocking locks on nfs reexports") [sic] and commit b840be2f00 ("lockd:
don't attempt blocking locks on nfs reexports"), and instead uses a
check of ->lock (Vasily's suggestion) to decide whether to support
blocking lock notifications on a given filesystem. Also add a little
documentation.
Perhaps someday we could add back an export flag later to allow
filesystems with "good" ->lock methods to support blocking lock
notifications.
Reported-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[ cel: Description rewritten to address checkpatch nits ]
[ cel: Fixed warning when SUNRPC debugging is disabled ]
[ cel: Fixed NULL check ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Vasily Averin <vvs@virtuozzo.com>
svc_set_num_threads() does everything that lockd_start_svc() does, except
set sv_maxconn. It also (when passed 0) finds the threads and
stops them with kthread_stop().
So move the setting for sv_maxconn, and use svc_set_num_thread()
We now don't need nlmsvc_task.
Now that we use svc_set_num_threads() it makes sense to set svo_module.
This request that the thread exists with module_put_and_exit().
Also fix the documentation for svo_module to make this explicit.
svc_prepare_thread is now only used where it is defined, so it can be
made static.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
lockd_create_svc() already does an svc_get() if the service already
exists, so it is more like a "get" than a "create".
So:
- Move the increment of nlmsvc_users into the function as well
- rename to lockd_get().
It is now the inverse of lockd_put().
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
There is some cleanup that is duplicated in lockd_down() and the failure
path of lockd_up().
Factor these out into a new lockd_put() and call it from both places.
lockd_put() does *not* take the mutex - that must be held by the caller.
It decrements nlmsvc_users and if that reaches zero, it cleans up.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The normal place to call svc_exit_thread() is from the thread itself
just before it exists.
Do this for lockd.
This means that nlmsvc_rqst is not used out side of lockd_start_svc(),
so it can be made local to that function, and renamed to 'rqst'.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
lockd_start_svc() only needs to be called once, just after the svc is
created. If the start fails, the svc is discarded too.
It thus makes sense to call lockd_start_svc() from lockd_create_svc().
This allows us to remove the test against nlmsvc_rqst at the start of
lockd_start_svc() - it must always be NULL.
lockd_up() only held an extra reference on the svc until a thread was
created - then it dropped it. The thread - and thus the extra reference
- will remain until kthread_stop() is called.
Now that the thread is created in lockd_create_svc(), the extra
reference can be dropped there. So the 'serv' variable is no longer
needed in lockd_up().
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that the network status notifiers use nlmsvc_serv rather then
nlmsvc_rqst the management can be simplified.
Notifier unregistration synchronises with any pending notifications so
providing we unregister before nlm_serv is freed no further interlock
is required.
So we move the unregister call to just before the thread is killed
(which destroys the service) and just before the service is destroyed in
the failure-path of lockd_up().
Then nlm_ntf_refcnt and nlm_ntf_wq can be removed.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
lockd has two globals - nlmsvc_task and nlmsvc_rqst - but mostly it
wants the 'struct svc_serv', and when it doesn't want it exactly it can
get to what it wants from the serv.
This patch is a first step to removing nlmsvc_task and nlmsvc_rqst. It
introduces nlmsvc_serv to store the 'struct svc_serv*'. This is set as
soon as the serv is created, and cleared only when it is destroyed.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The use of sv_nrthreads as a general refcount results in clumsy code, as
is seen by various comments needed to explain the situation.
This patch introduces a 'struct kref' and uses that for reference
counting, leaving sv_nrthreads to be a pure count of threads. The kref
is managed particularly in svc_get() and svc_put(), and also nfsd_put();
svc_destroy() now takes a pointer to the embedded kref, rather than to
the serv.
nfsd allows the svc_serv to exist with ->sv_nrhtreads being zero. This
happens when a transport is created before the first thread is started.
To support this, a 'keep_active' flag is introduced which holds a ref on
the svc_serv. This is set when any listening socket is successfully
added (unless there are running threads), and cleared when the number of
threads is set. So when the last thread exits, the nfs_serv will be
destroyed.
The use of 'keep_active' replaces previous code which checked if there
were any permanent sockets.
We no longer clear ->rq_server when nfsd() exits. This was done
to prevent svc_exit_thread() from calling svc_destroy().
Instead we take an extra reference to the svc_serv to prevent
svc_destroy() from being called.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
svc_destroy() is poorly named - it doesn't necessarily destroy the svc,
it might just reduce the ref count.
nfsd_destroy() is poorly named for the same reason.
This patch:
- removes the refcount functionality from svc_destroy(), moving it to
a new svc_put(). Almost all previous callers of svc_destroy() now
call svc_put().
- renames nfsd_destroy() to nfsd_put() and improves the code, using
the new svc_destroy() rather than svc_put()
- removes a few comments that explain the important for balanced
get/put calls. This should be obvious.
The only non-trivial part of this is that svc_destroy() would call
svc_sock_update() on a non-final decrement. It can no longer do that,
and svc_put() isn't really a good place of it. This call is now made
from svc_exit_thread() which seems like a good place. This makes the
call *before* sv_nrthreads is decremented rather than after. This
is not particularly important as the call just sets a flag which
causes sv_nrthreads set be checked later. A subsequent patch will
improve the ordering.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
It is common for 'get' functions to return the object that was 'got',
and there are a couple of places where users of svc_get() would be a
little simpler if svc_get() did that.
Make it so.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
support for a filehandle format deprecated 20 years ago, and further
xdr-related cleanup from Chuck.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAmGMPYkVHGJmaWVsZHNA
ZmllbGRzZXMub3JnAAoJECebzXlCjuG+JVwQAKbrpgbzl91u+T6W9MUGgQVzDpeP
XIy3NxCu/4pZ8SToWF3trz71sskokmkPPaZyuISD2C8e4DxO5LQ3fJLhtS9CjRFB
x4iZUxH7V2BoWrb5SY6TDWBEqaq4MY9f7tIbvUu5xpa0FIupLqJjYh2CP8vqtsbm
lblQKXz4ao0jwDzSVimNnPcTccpB25VIzwHsSOszRhN4rTjMgyHoETx2cqJne5IU
Tx/hH0UlpnwuQ7aVpcjMoKqIyUWDTMejx51pyZhHB47DVKL7HsnZvg59mTpXFcBx
29edvWT9yy1+w3nGkTYSkOgO9DyHvCbmQzIsvoYlmbZ2sdmTKK8Wuv2Ehcw3OfvL
MXGmy2EXIhzvTZXyN6pL1bBwwNSxdqJhVSxvrPLz1EymIkxf/IDI8eyUicVXd3Vq
K2xOn+CXyIbXWCU85ru8UA77r1+x//gSwqcJvtKUavbNJUwNt935CE2n3+o/0OL/
pToZ89nhcaRyDP1jJKA37K48VLNtBXzZZQlRovyLelNojam/kzZkXX8dI6oV9VD1
Ymjm0mbdZzwhE3C1HxKlxwZqhN+7YoyxMQuWjFMp28wxH+dkz/USCulKZ3/H+neD
0YBSgvwe92JqkZTW2AOjipL+beAuKJ4zsfCCl2XZig/rHGutiwOf2GfgdRmJM6AD
6aiufVWKNNRQef9y
=yKBl
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.16' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"A slow cycle for nfsd: mainly cleanup, including Neil's patch dropping
support for a filehandle format deprecated 20 years ago, and further
xdr-related cleanup from Chuck"
* tag 'nfsd-5.16' of git://linux-nfs.org/~bfields/linux: (26 commits)
nfsd4: remove obselete comment
nfsd: document server-to-server-copy parameters
NFSD:fix boolreturn.cocci warning
nfsd: update create verifier comment
SUNRPC: Change return value type of .pc_encode
SUNRPC: Replace the "__be32 *p" parameter to .pc_encode
NFSD: Save location of NFSv4 COMPOUND status
SUNRPC: Change return value type of .pc_decode
SUNRPC: Replace the "__be32 *p" parameter to .pc_decode
SUNRPC: De-duplicate .pc_release() call sites
SUNRPC: Simplify the SVC dispatch code path
SUNRPC: Capture value of xdr_buf::page_base
SUNRPC: Add trace event when alloc_pages_bulk() makes no progress
svcrdma: Split svcrmda_wc_{read,write} tracepoints
svcrdma: Split the svcrdma_wc_send() tracepoint
svcrdma: Split the svcrdma_wc_receive() tracepoint
NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment()
SUNRPC: xdr_stream_subsegment() must handle non-zero page_bases
NFSD: Initialize pointer ni with NULL and not plain integer 0
NFSD: simplify struct nfsfh
...
Returning an undecorated integer is an age-old trope, but it's
not clear (even to previous experts in this code) that the only
valid return values are 1 and 0. These functions do not return
a negative errno, rpc_stat value, or a positive length.
Document there are only two valid return values by having
.pc_encode return only true or false.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The passed-in value of the "__be32 *p" parameter is now unused in
every server-side XDR encoder, and can be removed.
Note also that there is a line in each encoder that sets up a local
pointer to a struct xdr_stream. Passing that pointer from the
dispatcher instead saves one line per encoder function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Returning an undecorated integer is an age-old trope, but it's
not clear (even to previous experts in this code) that the only
valid return values are 1 and 0. These functions do not return
a negative errno, rpc_stat value, or a positive length.
Document there are only two valid return values by having
.pc_decode return only true or false.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The passed-in value of the "__be32 *p" parameter is now unused in
every server-side XDR decoder, and can be removed.
Note also that there is a line in each decoder that sets up a local
pointer to a struct xdr_stream. Passing that pointer from the
dispatcher instead saves one line per decoder function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
- Fix crash in NLM TEST procedure
- NFSv4.1+ backchannel not restored after PATH_DOWN
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmFLUKcACgkQM2qzM29m
f5fCAhAAp66o6n49/fxOLWo+MftFlT1EY8NtFjyTh1x/o4R9S74qxTy3RC3GzRvk
oGOnkFvuiToyjcoeyb9yumYxO00Qf75hrTJvsXqnsbrLZOAKVuITn9MkQXBOXjCi
GDxQSRFg8ihz0vG4YbE/brnZR1fIMr7KSzXLwdXOs8mKvro7JmiiB87JOGhw9yon
W9+bFcnN2TynYsqmtHu987LvaIUE79dFfhrfj6bIobNQ25oqJoG5e1/M48/1MJol
DFPiWoErJ/S1c0lA8rbjIvtzgbXs84U88EXmFUVsxSXhepGui3Uh/cA49vu46icH
vze8fwHs6q3qzF7gE6jbslrrdQ/H6AZ6arhe27h4cVxdh0AouDuBat2xLY2I4TP3
DckfLbEsOqTJhfzqYnk+8ckOaBMpkfyDqG6SodIKglPoknNCtCp0/7NuYF0yMLe5
I6pO7JDgz7ySrbpm27ZMOpdwkLqqA1i8V9MPvimUsKTYJqlVBsc2RsdldQhunNbd
50InJarWQ+japkEl3WK3aJ5rTluiIWjcePT7wA76wP3PnZmcjweOiQMc8uuLlzPw
tOLRlHdpdZzeM3hGuI6KKsg8ZRbDB7L8YiaLkSwxl2qwJwDSB0xo7/WWwVzkyfdf
zdQ2cR9z70I2Bgxq/1lAPB8tXq+SEvu1qCYDFSTo3I9c0Y2frs4=
=L0c1
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"Critical bug fixes:
- Fix crash in NLM TEST procedure
- NFSv4.1+ backchannel not restored after PATH_DOWN"
* tag 'nfsd-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWN
NLM: Fix svcxdr_encode_owner()
Dai Ngo reports that, since the XDR overhaul, the NLM server crashes
when the TEST procedure wants to return NLM_DENIED. There is a bug
in svcxdr_encode_owner() that none of our standard test cases found.
Replace the open-coded function with a call to an appropriate
pre-fabricated XDR helper.
Reported-by: Dai Ngo <Dai.Ngo@oracle.com>
Fixes: a6a63ca565 ("lockd: Common NLM XDR helpers")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
- New Features:
- Better client responsiveness when server isn't replying
- Use refcount_t in sunrpc rpc_client refcount tracking
- Add srcaddr and dst_port to the sunrpc sysfs info files
- Add basic support for connection sharing between servers with multiple NICs`
- Bugfixes and Cleanups:
- Sunrpc tracepoint cleanups
- Disconnect after ib_post_send() errors to avoid deadlocks
- Fix for tearing down rpcrdma_reps
- Fix a potential pNFS layoutget livelock loop
- pNFS layout barrier fixes
- Fix a potential memory corruption in rpc_wake_up_queued_task_set_status()
- Fix reconnection locking
- Fix return value of get_srcport()
- Remove rpcrdma_post_sends()
- Remove pNFS dead code
- Remove copy size restriction for inter-server copies
- Overhaul the NFS callback service
- Clean up sunrpc TCP socket shutdowns
- Always provide aligned buffers to RPC read layers
-----BEGIN PGP SIGNATURE-----
iQIyBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmExP7AACgkQ18tUv7Cl
QOshTg/zBz7OfrS23CcLLgNidTJ6S7JOuj1DShG+YzsYXT8f9Nl1DadLM7yAEyok
6JZzC8rXYzJcmYztHZzRyTuzj1+tGGb0u/MrD0bBk42VEel6eOjH/Y9ybn12Gf/E
aqlcJh8hPx44U8oo5EFjRJsg2h28O06vywqhJz+sTbkqKN4hlAgMOo5ysAB+1thg
BrTlR84EKBw5QqxPJ1WPmq9tEyGebU9Yrj1p8f0Uf015IeRNeTOXx3NzmdPshphf
2yJvjumwEzqkcHXTJFDfP6ikIcGPPMNVAOK8DHb+vDGzNsOXW7dDM7GuWA3U8DlU
ZHvyyb05Wwe6Wwg8xwx90FEXcYZFfZbSKmI9z2uoOuGFzNG07zWzPDzRft+qrOvU
VMMwP9oEh71+qesmWTvqIbR2RjxqbCYlTcc8mBrD66DROi6jZ2jznraNC85sxG0Q
b8GE+2SnYr2Q25yehj2xrRlOXyiYNkeeYmIpIquEqH9o7cSyDNJhBWbzIv6x+ith
O/S06ZVKMc9X1nH5t5121XcHrSTMMVA/67WMyKfKMxWnrADAWPQALG+ttoTcbRu7
Txew3Jb+hB8+ZdHAqbPf1l1i+7USQl1CRHMw3GRvNjCL2qcjZb1R7eyJRSQQtUyw
q6SJRGe6Sn1FTUnn96Hv15Zy8VHx+q0cOL/EQVzL1RzJIXYcag==
=Ad/3
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-5.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New Features:
- Better client responsiveness when server isn't replying
- Use refcount_t in sunrpc rpc_client refcount tracking
- Add srcaddr and dst_port to the sunrpc sysfs info files
- Add basic support for connection sharing between servers with multiple NICs`
Bugfixes and Cleanups:
- Sunrpc tracepoint cleanups
- Disconnect after ib_post_send() errors to avoid deadlocks
- Fix for tearing down rpcrdma_reps
- Fix a potential pNFS layoutget livelock loop
- pNFS layout barrier fixes
- Fix a potential memory corruption in rpc_wake_up_queued_task_set_status()
- Fix reconnection locking
- Fix return value of get_srcport()
- Remove rpcrdma_post_sends()
- Remove pNFS dead code
- Remove copy size restriction for inter-server copies
- Overhaul the NFS callback service
- Clean up sunrpc TCP socket shutdowns
- Always provide aligned buffers to RPC read layers"
* tag 'nfs-for-5.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (39 commits)
NFS: Always provide aligned buffers to the RPC read layers
NFSv4.1 add network transport when session trunking is detected
SUNRPC enforce creation of no more than max_connect xprts
NFSv4 introduce max_connect mount options
SUNRPC add xps_nunique_destaddr_xprts to xprt_switch_info in sysfs
SUNRPC keep track of number of transports to unique addresses
NFSv3: Delete duplicate judgement in nfs3_async_handle_jukebox
SUNRPC: Tweak TCP socket shutdown in the RPC client
SUNRPC: Simplify socket shutdown when not reusing TCP ports
NFSv4.2: remove restriction of copy size for inter-server copy.
NFS: Clean up the synopsis of callback process_op()
NFS: Extract the xdr_init_encode/decode() calls from decode_compound
NFS: Remove unused callback void decoder
NFS: Add a private local dispatcher for NFSv4 callback operations
SUNRPC: Eliminate the RQ_AUTHERR flag
SUNRPC: Set rq_auth_stat in the pg_authenticate() callout
SUNRPC: Add svc_rqst::rq_auth_stat
SUNRPC: Add dst_port to the sysfs xprt info file
SUNRPC: Add srcaddr as a file in sysfs
sunrpc: Fix return value of get_srcport()
...
As in the v4 case, it doesn't work well to block waiting for a lock on
an nfs filesystem.
As in the v4 case, that means we're depending on the client to poll.
It's probably incorrect to depend on that, but I *think* clients do poll
in practice. In any case, it's an improvement over hanging the lockd
thread indefinitely as we currently are.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
We shouldn't really be using a read-only file descriptor to take a write
lock.
Most filesystems will put up with it. But NFS, for example, won't.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Update comment to reflect that we *do* allow reexport, whether it's a
good idea or not....
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Make this lookup slightly more concise, and prepare for changing how we
look this up in a following patch.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Locks have two sets of op arrays, fl_lmops for the lock manager (lockd
or nfsd), fl_ops for the filesystem. The server-side lockd code has
been setting its own fl_ops, which leads to confusion (and crashes) in
the reexport case, where the filesystem expects to be the only one
setting fl_ops.
And there's no reason for it that I can see-the lm_get/put_owner ops do
the same job.
Reported-by: Daire Byrne <daire@dneg.com>
Tested-by: Daire Byrne <daire@dneg.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
nsm_use_hostnames is a module parameter and it will be exported to sysctl
procfs. This is to let user sometimes change it from userspace. But the
minimal unit for sysctl procfs read/write it sizeof(int).
In big endian system, the converting from/to bool to/from int will cause
error for proc items.
This patch use a new proc_handler proc_dobool to fix it.
Signed-off-by: Jia He <hejianet@gmail.com>
Reviewed-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
[thuth: Fix typo in commit message]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
After calling vfs_test_lock() the pointer to a conflicting lock can be
returned, and that lock is not guarunteed to be owned by nlm. In that
case, we cannot cast it to struct nlm_lockowner. Instead return the pid
of that conflicting lock.
Fixes: 646d73e91b ("lockd: Show pid of lockd for remote locks")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
In a few moments, rq_auth_stat will need to be explicitly set to
rpc_auth_ok before execution gets to the dispatcher.
svc_authenticate() already sets it, but it often gets reset to
rpc_autherr_badcred right after that call, even when authentication
is successful. Let's ensure that the pg_authenticate callout and
svc_set_client() set it properly in every case.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Add a .h file containing xdr_stream-based XDR helpers common to both
NLMv3 and NLMv4.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
To enable xdr_stream-based encoding and decoding, create a bespoke
RPC dispatch function for the lockd service.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The next few patches will employ these strings to help make server-
side trace logs more human-readable. A similar technique is already
in use in kernel RPC client code.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Replace a comma between expression statements by a semicolon.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
NLM uses an interval-based rebinding, i.e. it clears the transport's
binding under certain conditions if more than 60 seconds have elapsed
since the connection was last bound.
This rebinding is not necessary for an autobind RPC client over a
connection-oriented protocol like TCP.
It can also cause problems: it is possible for nlm_bind_host() to clear
XPRT_BOUND whilst a connection worker is in the middle of trying to
reconnect, after it had already been checked in xprt_connect().
When the connection worker notices that XPRT_BOUND has been cleared
under it, in xs_tcp_finish_connecting(), that results in:
xs_tcp_setup_socket: connect returned unhandled error -107
Worse, it's possible that the two can get into lockstep, resulting in
the same behaviour repeated indefinitely, with the above error every
300 seconds, without ever recovering, and the connection never being
established. This has been seen in practice, with a large number of NLM
client tasks, following a server restart.
The existing callers of nlm_bind_host & nlm_rebind_host should not need
to force the rebind, for TCP, so restrict the interval-based rebinding
to UDP only.
For TCP, we will still rebind when needed, e.g. on timeout, and connection
error (including closure), since connection-related errors on an existing
connection, ECONNREFUSED when trying to connect, and rpc_check_timeout(),
already unconditionally clear XPRT_BOUND.
To avoid having to add the fix, and explanation, to both nlm_bind_host()
and nlm_rebind_host(), remove the duplicate code from the former, and
have it call the latter.
Drop the dprintk, which adds no value over a trace.
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Fixes: 35f5a422ce ("SUNRPC: new interface to force an RPC rebind")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
has the same arguments as READ but allows the server to return an array
of data and hole extents.
Otherwise it's a lot of cleanup and bugfixes.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAl+Q5vsVHGJmaWVsZHNA
ZmllbGRzZXMub3JnAAoJECebzXlCjuG+DUAP/RlALnXbaoWi8YCcEcc9U1LoQKbD
CJpDR+FqCOyGwRuzWung/5pvkOO50fGEeAroos+2rF/NgRkQq8EFr9AuBhNOYUFE
IZhWEOfu/r2ukXyBmcu21HGcWLwPnyJehvjuzTQW2wOHlBi/sdoL5Ap1sVlwVLj5
EZ5kqJLD+ioG2sufW99Spi55l1Cy+3Y0IhLSWl4ZAE6s8hmFSYAJZFsOeI0Afx57
USPTDRaeqjyEULkb+f8IhD0eRApOUo4evDn9dwQx+of7HPa1CiygctTKYwA3hnlc
gXp2KpVA1REaiYVgOPwYlnqBmJ2K9X0wCRzcWy2razqEcVAX/2j7QCe9M2mn4DC8
xZ2q4SxgXu9yf0qfUSVnDxWmP6ipqq7OmsG0JXTFseGKBdpjJY1qHhyqanVAGvEg
I+xHnnWfGwNCftwyA3mt3RfSFPsbLlSBIMZxvN4kn8aVlqszGITOQvTdQcLYA6kT
xWllBf4XKVXMqF0PzerxPDmfzBfhx6b1VPWOIVcu7VLBg3IXoEB2G5xG8MUJiSch
OUTCt41LUQkerQlnzaZYqwmFdSBfXJefmcE/x/vps4VtQ/fPHX1jQyD7iTu3HfSP
bRlkKHvNVeTodlBDe/HTPiTA99MShhBJyvtV5wfzIqwjc1cNreed+ePppxn8mxJi
SmQ2uZk/MpUl7/V0
=rcOj
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"The one new feature this time, from Anna Schumaker, is READ_PLUS,
which has the same arguments as READ but allows the server to return
an array of data and hole extents.
Otherwise it's a lot of cleanup and bugfixes"
* tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux: (43 commits)
NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy
SUNRPC: fix copying of multiple pages in gss_read_proxy_verf()
sunrpc: raise kernel RPC channel buffer size
svcrdma: fix bounce buffers for unaligned offsets and multiple pages
nfsd: remove unneeded break
net/sunrpc: Fix return value for sysctl sunrpc.transports
NFSD: Encode a full READ_PLUS reply
NFSD: Return both a hole and a data segment
NFSD: Add READ_PLUS hole segment encoding
NFSD: Add READ_PLUS data support
NFSD: Hoist status code encoding into XDR encoder functions
NFSD: Map nfserr_wrongsec outside of nfsd_dispatch
NFSD: Remove the RETURN_STATUS() macro
NFSD: Call NFSv2 encoders on error returns
NFSD: Fix .pc_release method for NFSv2
NFSD: Remove vestigial typedefs
NFSD: Refactor nfsd_dispatch() error paths
NFSD: Clean up nfsd_dispatch() variables
NFSD: Clean up stale comments in nfsd_dispatch()
NFSD: Clean up switch statement in nfsd_dispatch()
...
Clean up: Follow-up on ten-year-old commit b9081d90f5 ("NFS: kill
off complicated macro 'PROC'") by performing the same conversion in
the lockd code. To reduce the chance of error, I copied the original
C preprocessor output and then made some minor edits.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NFSv2, v3 and NFSv4 servers often have duplicate replay caches that look
at the source port when deciding whether or not an RPC call is a replay
of a previous call. This requires clients to perform strange TCP gymnastics
in order to ensure that when they reconnect to the server, they bind
to the same source port.
NFSv4.1 and NFSv4.2 have sessions that provide proper replay semantics,
that do not look at the source port of the connection. This patch therefore
ensures they can ignore the rebind requirement.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Fix sparse warnings:
fs/lockd/clntproc.c:57:6: warning: symbol 'nlmclnt_put_lockowner' was not declared. Should it be static?
fs/lockd/svclock.c:409:35: warning: symbol 'nlmsvc_lock_ops' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Use the pid of lockd instead of the remote lock's svid for the fl_pid for
local POSIX locks. This allows proper enumeration of which local process
owns which lock. The svid is meaningless to local lock readers.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Now that the NLM server allocates an nlm_lockowner for fl_owner, there's
no need for special hashing or comparison.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Do as the NLM client: allocate and track a struct nlm_lockowner for use as
the fl_owner for locks created by the NLM sever. This allows us to keep
the svid within this structure for matching locks, and will allow us to
track the pid of lockd in a future patch. It should also allow easier
reference of the nlm_host in conflicting locks, and simplify lock hashing
and comparison.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
[bfields@redhat.com: fix type of some error returns]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The nlm_lockowner structure that the client uses to track locks is
generally useful to the server as well. Very similar functions to handle
allocation and tracking of the nlm_lockowner will follow. Rename the client
functions for clarity.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This reverts most of commit b8eee0e90f ("lockd: Show pid of lockd for
remote locks"), which caused remote locks to not be differentiated between
remote processes for NLM.
We retain the fixup for setting the client's fl_pid to a negative value.
Fixes: b8eee0e90f ("lockd: Show pid of lockd for remote locks")
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: XueWei Zhang <xueweiz@google.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have MODULE_LICENCE("GPL*") inside which was used in the initial
scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Scott Mayhew revived an old api that communicates with a userspace
daemon to manage some on-disk state that's used to track clients across
server reboots. We've been using a usermode_helper upcall for that, but
it's tough to run those with the right namespaces, so a daemon is much
friendlier to container use cases.
Trond fixed nfsd's handling of user credentials in user namespaces. He
also contributed patches that allow containers to support different sets
of NFS protocol versions.
The only remaining container bug I'm aware of is that the NFS reply
cache is shared between all containers. If anyone's aware of other gaps
in our container support, let me know.
The rest of this is miscellaneous bugfixes.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAlzcWNcVHGJmaWVsZHNA
ZmllbGRzZXMub3JnAAoJECebzXlCjuG+DUEP/0WD3jKNAHFV3M5YQPAI9fz/iCND
Db/A4oWP5qa6JmwmHe61il29QeGqkeFr/NPexgzM3Xw2E39d7RBXBeWyVDuqb0wr
6SCXjXibTsuAHg11nR8Xf0P5Vej3rfGbG6up5lLCIDTEZxVpWoaBJnM8+3bewuCj
XbeiDW54oiMbmDjon3MXqVAIF/z7LjorecJ+Yw5+0Jy7KZ6num9Kt8+fi7qkEfFd
i5Bp9KWgzlTbJUJV4EX3ZKN3zlGkfOvjoo2kP3PODPVMB34W8jSLKkRSA1tDWYZg
43WhBt5OODDlV6zpxSJXehYKIB4Ae469+RRaIL4F+ORRK+AzR0C/GTuOwJiG+P3J
n95DX5WzX74nPOGQJgAvq4JNpZci85jM3jEK1TR2M7KiBDG5Zg+FTsPYVxx5Sgah
Akl/pjLtHQPSdBbFGHn5TsXU+gqWNiKsKa9663tjxLb8ldmJun6JoQGkAEF9UJUn
dzv0UxyHeHAblhSynY+WsUR+Xep9JDo/p5LyFK4if9Sd62KeA1uF/MFhAqpKZF81
mrgRCqW4sD8aVTBNZI06pZzmcZx4TRr2o+Oj5KAXf6Yk6TJRSGfnQscoMMBsTLkw
VK1rBQ/71TpjLHGZZZEx1YJrkVZAMmw2ty4DtK2f9jeKO13bWmUpc6UATzVufHKA
C1rUZXJ5YioDbYDy
=TUdw
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.2' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"This consists mostly of nfsd container work:
Scott Mayhew revived an old api that communicates with a userspace
daemon to manage some on-disk state that's used to track clients
across server reboots. We've been using a usermode_helper upcall for
that, but it's tough to run those with the right namespaces, so a
daemon is much friendlier to container use cases.
Trond fixed nfsd's handling of user credentials in user namespaces. He
also contributed patches that allow containers to support different
sets of NFS protocol versions.
The only remaining container bug I'm aware of is that the NFS reply
cache is shared between all containers. If anyone's aware of other
gaps in our container support, let me know.
The rest of this is miscellaneous bugfixes"
* tag 'nfsd-5.2' of git://linux-nfs.org/~bfields/linux: (23 commits)
nfsd: update callback done processing
locks: move checks from locks_free_lock() to locks_release_private()
nfsd: fh_drop_write in nfsd_unlink
nfsd: allow fh_want_write to be called twice
nfsd: knfsd must use the container user namespace
SUNRPC: rsi_parse() should use the current user namespace
SUNRPC: Fix the server AUTH_UNIX userspace mappings
lockd: Pass the user cred from knfsd when starting the lockd server
SUNRPC: Temporary sockets should inherit the cred from their parent
SUNRPC: Cache the process user cred in the RPC server listener
nfsd: Allow containers to set supported nfs versions
nfsd: Add custom rpcbind callbacks for knfsd
SUNRPC: Allow further customisation of RPC program registration
SUNRPC: Clean up generic dispatcher code
SUNRPC: Add a callback to initialise server requests
SUNRPC/nfs: Fix return value for nfs4_callback_compound()
nfsd: handle legacy client tracking records sent by nfsdcld
nfsd: re-order client tracking method selection
nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcld
nfsd: un-deprecate nfsdcld
...
When we create a new lockd client, we want to be able to pass the
correct credential of the process that created the struct nlm_host.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
When converting kuids to AUTH_UNIX creds, etc we will want to use the
same user namespace as the process that created the rpc client.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The RPC_TASK_KILLED flag should really not be set from another context
because it can clobber data in the struct task when task->tk_flags is
changed non-atomically.
Let's therefore swap out RPC_TASK_KILLED with an atomic flag, and add
a function to set that flag and safely wake up the task.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
When starting up a new knfsd server, pass the user cred to the
supporting lockd server.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In order to be able to interpret uids and gids correctly in knfsd, we
should cache the user namespace of the process that created the RPC
server's listener. To do so, we refcount the credential of that process.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Add a callback to allow customisation of the rpcbind registration.
When clients have the ability to turn on and off version support,
we want to allow them to also prevent registration of those
versions with the rpc portmapper.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Add a callback to help initialise server requests before they are
processed. This will allow us to clean up the NFS server version
support, and to make it container safe.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If the last NFSv3 unmount from a given host races with a mount from the
same host, we can destroy an nlm_host that is still in use.
Specifically nlmclnt_lookup_host() can increment h_count on
an nlm_host that nlmclnt_release_host() has just successfully called
refcount_dec_and_test() on.
Once nlmclnt_lookup_host() drops the mutex, nlm_destroy_host_lock()
will be called to destroy the nlmclnt which is now in use again.
The cause of the problem is that the dec_and_test happens outside the
locked region. This is easily fixed by using
refcount_dec_and_mutex_lock().
Fixes: 8ea6ecc8b0 ("lockd: Create client-side nlm_host cache")
Cc: stable@vger.kernel.org (v2.6.38+)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
This issue is now captured by a trace point in the RPC client.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Note that there is a conflict with the rdma tree in this pull request, since
we delete a file that has been changed in the rdma tree. Hopefully that's
easy enough to resolve!
We also were unable to track down a maintainer for Neil Brown's changes to
the generic cred code that are prerequisites to his RPC cred cleanup patches.
We've been asking around for several months without any response, so
hopefully it's okay to include those patches in this pull request.
Stable bugfixes:
- xprtrdma: Yet another double DMA-unmap # v4.20
Features:
- Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG
- Per-xprt rdma receive workqueues
- Drop support for FMR memory registration
- Make port= mount option optional for RDMA mounts
Other bugfixes and cleanups:
- Remove unused nfs4_xdev_fs_type declaration
- Fix comments for behavior that has changed
- Remove generic RPC credentials by switching to 'struct cred'
- Fix crossing mountpoints with different auth flavors
- Various xprtrdma fixes from testing and auditing the close code
- Fixes for disconnect issues when using xprtrdma with krb5
- Clean up and improve xprtrdma trace points
- Fix NFS v4.2 async copy reboot recovery
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlwtO50ACgkQ18tUv7Cl
QOtZWQ//e5Hhp2TnQZ6U+99YKedjwBHP6psH3GKSEdeHSNdlSpZ5ckgHxvMb9TBa
6t4ecgv5P/uYLIePQ0u2ubUFc9+TlyGi7Iacx13/YhK7kihGHDPnZhfl0QbYixV7
rwa9bFcKmOrXs8ld+Hw3P2UL22G1gMf/LHDhPNshbW7LFZmcshKz+mKTk70kwkq9
v7tFC59p6GwV8Sr2YI2NXn2fOWsUS00sQfgj2jceJYJ8PsNa+wHYF4wPj2IY5NsE
D5Oq2kLPbytBhCllOHgopNZaf4qb5BfqhVETyc1O+kDF3BZKUhQ1PoDi2FPinaHM
5/d8hS+5fr3eMBsQrPWQLXYjWQFUXnkQQJvU3Bo52AIgomsk/8uBq3FvH7XmFcBd
C8sgnuUAkAS8feMes8GCS50BTxclnGuYGdyFJyCRXoG9Kn9rMrw9EKitky6EVq0v
NmXhW79jK84a3yDXVlAIpZ8Y9BU/HQ3GviGX8lQEdZU9YiYRzDIHvpMFwzMgqaBi
XvLbr8PlLOm8GZokThS8QYT/G2Wu6IwfUq/AufVjVD4+HiL3duKKfWSGAvcm6aAa
GoRF6UG+OmjWlzKojtRc1dI+sy22Fzh+DW+Mx6tuf/b/66wkmYnW7eKcV4rt6Tm5
/JEhvTMo9q7elL/4FgCoMCcdoc5eXqQyXRXrQiOU7YHLzn2aWU0=
=DvVW
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"Stable bugfixes:
- xprtrdma: Yet another double DMA-unmap # v4.20
Features:
- Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG
- Per-xprt rdma receive workqueues
- Drop support for FMR memory registration
- Make port= mount option optional for RDMA mounts
Other bugfixes and cleanups:
- Remove unused nfs4_xdev_fs_type declaration
- Fix comments for behavior that has changed
- Remove generic RPC credentials by switching to 'struct cred'
- Fix crossing mountpoints with different auth flavors
- Various xprtrdma fixes from testing and auditing the close code
- Fixes for disconnect issues when using xprtrdma with krb5
- Clean up and improve xprtrdma trace points
- Fix NFS v4.2 async copy reboot recovery"
* tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits)
sunrpc: convert to DEFINE_SHOW_ATTRIBUTE
sunrpc: Add xprt after nfs4_test_session_trunk()
sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFS
sunrpc: handle ENOMEM in rpcb_getport_async
NFS: remove unnecessary test for IS_ERR(cred)
xprtrdma: Prevent leak of rpcrdma_rep objects
NFSv4.2 fix async copy reboot recovery
xprtrdma: Don't leak freed MRs
xprtrdma: Add documenting comment for rpcrdma_buffer_destroy
xprtrdma: Replace outdated comment for rpcrdma_ep_post
xprtrdma: Update comments in frwr_op_send
SUNRPC: Fix some kernel doc complaints
SUNRPC: Simplify defining common RPC trace events
NFS: Fix NFSv4 symbolic trace point output
xprtrdma: Trace mapping, alloc, and dereg failures
xprtrdma: Add trace points for calls to transport switch methods
xprtrdma: Relocate the xprtrdma_mr_map trace points
xprtrdma: Clean up of xprtrdma chunk trace points
xprtrdma: Remove unused fields from rpcrdma_ia
xprtrdma: Cull dprintk() call sites
...
NFSv4.2 client, and cleaning up some convoluted backchannel server code
in the process. Otherwise, miscellaneous smaller bugfixes and cleanup.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJcLR3nAAoJECebzXlCjuG+oyAQALrPSTH9Qg2AwP2eGm+AevUj
u/VFmimImIO9dYuT02t4w42w4qMIQ0/Y7R0UjT3DxG5Oixy/zA+ZaNXCCEKwSMIX
abGF4YalUISbDc6n0Z8J14/T33wDGslhy3IQ9Jz5aBCDCocbWlzXvFlmrowbb3ak
vtB0Fc3Xo6Z/Pu2GzNzlqR+f69IAmwQGJrRrAEp3JUWSIBKiSWBXTujDuVBqJNYj
ySLzbzyAc7qJfI76K635XziULR2ueM3y5JbPX7kTZ0l3OJ6Yc0PtOj16sIv5o0XK
DBYPrtvw3ZbxQE/bXqtJV9Zn6MG5ODGxKszG1zT1J3dzotc9l/LgmcAY8xVSaO+H
QNMdU9QuwmyUG20A9rMoo/XfUb5KZBHzH7HIYOmkfBidcaygwIInIKoIDtzimm4X
OlYq3TL/3QDY6rgTCZv6n2KEnwiIDpc5+TvFhXRWclMOJMcMSHJfKFvqERSv9V3o
90qrCebPA0K8Dnc0HMxcBXZ+0TqZ2QeXp/wfIjibCXqMwlg+BZhmbeA0ngZ7x7qf
2F33E9bfVJjL+VI5FcVYQf43bOTWZgD6ZKGk4T7keYl0CPH+9P70bfhl4KKy9dqc
GwYooy/y5FPb2CvJn/EETeILRJ9OyIHUrw7HBkpz9N8n9z+V6Qbp9yW7LKgaMphW
1T+GpHZhQjwuBPuJhDK0
=dRLp
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Thanks to Vasily Averin for fixing a use-after-free in the
containerized NFSv4.2 client, and cleaning up some convoluted
backchannel server code in the process.
Otherwise, miscellaneous smaller bugfixes and cleanup"
* tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linux: (25 commits)
nfs: fixed broken compilation in nfs_callback_up_net()
nfs: minor typo in nfs4_callback_up_net()
sunrpc: fix debug message in svc_create_xprt()
sunrpc: make visible processing error in bc_svc_process()
sunrpc: remove unused xpo_prep_reply_hdr callback
sunrpc: remove svc_rdma_bc_class
sunrpc: remove svc_tcp_bc_class
sunrpc: remove unused bc_up operation from rpc_xprt_ops
sunrpc: replace svc_serv->sv_bc_xprt by boolean flag
sunrpc: use-after-free in svc_process_common()
sunrpc: use SVC_NET() in svcauth_gss_* functions
nfsd: drop useless LIST_HEAD
lockd: Show pid of lockd for remote locks
NFSD remove OP_CACHEME from 4.2 op_flags
nfsd: Return EPERM, not EACCES, in some SETATTR cases
sunrpc: fix cache_head leak due to queued request
nfsd: clean up indentation, increase indentation in switch statement
svcrdma: Optimize the logic that selects the R_key to invalidate
nfsd: fix a warning in __cld_pipe_upcall()
nfsd4: fix crash on writing v4_end_grace before nfsd startup
...
SUNRPC has two sorts of credentials, both of which appear as
"struct rpc_cred".
There are "generic credentials" which are supplied by clients
such as NFS and passed in 'struct rpc_message' to indicate
which user should be used to authorize the request, and there
are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS
which describe the credential to be sent over the wires.
This patch replaces all the generic credentials by 'struct cred'
pointers - the credential structure used throughout Linux.
For machine credentials, there is a special 'struct cred *' pointer
which is statically allocated and recognized where needed as
having a special meaning. A look-up of a low-level cred will
map this to a machine credential.
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Commit 9d5b86ac13 ("fs/locks: Remove fl_nspid and use fs-specific l_pid
for remote locks") specified that the l_pid returned for F_GETLK on a local
file that has a remote lock should be the pid of the lock manager process.
That commit, while updating other filesystems, failed to update lockd, such
that locks created by lockd had their fl_pid set to that of the remote
process holding the lock. Fix that here to be the pid of lockd.
Also, fix the client case so that the returned lock pid is negative, which
indicates a remote lock on a remote file.
Fixes: 9d5b86ac13 ("fs/locks: Remove fl_nspid and use fs-specific...")
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
posix_unblock_lock() is not specific to posix locks, and behaves
nearly identically to locks_delete_block() - the former returning a
status while the later doesn't.
So discard posix_unblock_lock() and use locks_delete_block() instead,
after giving that function an appropriate return value.
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
We fail to advance the read pointer when reading the stat.oh field that
identifies the lock-holder in a TEST result.
This turns out not to matter if the server is knfsd, which always
returns a zero-length field. But other servers (Ganesha is an example)
may not do this. The result is bad values in fcntl F_GETLK results.
Fix this.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
printk format used %*s instead of %.*s, so hostname_len does not limit
the number of bytes accessed from hostname.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
nfsd and lockd call vfs_lock_file() to lock/unlock the inode
returned by locks_inode(file).
Many places in nfsd/lockd code use the inode returned by
file_inode(file) for lock manipulation. With Overlayfs, file_inode()
(the underlying inode) is not the same object as locks_inode() (the
overlay inode). This can result in "Leaked POSIX lock" messages
and eventually to a kernel crash as reported by Eddie Horng:
https://marc.info/?l=linux-unionfs&m=153086643202072&w=2
Fix all the call sites in nfsd/lockd that should use locks_inode().
This is a correctness bug that manifested when overlayfs gained
NFS export support in v4.16.
Reported-by: Eddie Horng <eddiehorng.tw@gmail.com>
Tested-by: Eddie Horng <eddiehorng.tw@gmail.com>
Cc: Jeff Layton <jlayton@kernel.org>
Fixes: 8383f17488 ("ovl: wire up NFS export operations")
Cc: stable@vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The variables nlm_ntf_refcnt and nlm_ntf_wq are local to the source and
do not need to be in global scope, so make them static.
Cleans up sparse warnings:
fs/lockd/svc.c:60:10: warning: symbol 'nlm_ntf_refcnt' was not declared.
Should it be static?
fs/lockd/svc.c:61:1: warning: symbol 'nlm_ntf_wq' was not declared.
Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The server shouldn't actually delete the struct nlm_host until it hits
the garbage collector. In order to make that work correctly with the
refcount API, we can bump the refcount by one, and then use
refcount_dec_if_one() in the garbage collector.
Signed-off-by: Trond Myklebust <trondmy@gmail.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable nlm_rqst.a_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
**Important note for maintainers:
Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.
The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.
Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.
Please double check that you don't have some undocumented
memory guarantees for this variable usage.
For the nlm_rqst.a_count it might make a difference
in following places:
- nlmclnt_release_call() and nlmsvc_release_call(): decrement
in refcount_dec_and_test() only
provides RELEASE ordering and control dependency on success
vs. fully ordered atomic counterpart
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable nlm_lockowner.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
**Important note for maintainers:
Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.
The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.
Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.
Please double check that you don't have some undocumented
memory guarantees for this variable usage.
For the nlm_lockowner.count it might make a difference
in following places:
- nlm_put_lockowner(): decrement in refcount_dec_and_lock() only
provides RELEASE ordering, control dependency on success and
holds a spin lock on success vs. fully ordered atomic counterpart.
No changes in spin lock guarantees.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable nsm_handle.sm_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
**Important note for maintainers:
Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.
The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.
Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.
Please double check that you don't have some undocumented
memory guarantees for this variable usage.
For the nsm_handle.sm_count it might make a difference
in following places:
- nsm_release(): decrement in refcount_dec_and_lock() only
provides RELEASE ordering, control dependency on success
and holds a spin lock on success vs. fully ordered atomic
counterpart. No change for the spin lock guarantees.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable nlm_host.h_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
**Important note for maintainers:
Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.
The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.
Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.
Please double check that you don't have some undocumented
memory guarantees for this variable usage.
For the nlm_host.h_count it might make a difference
in following places:
- nlmsvc_release_host(): decrement in refcount_dec()
provides RELEASE ordering, while original atomic_dec()
was fully unordered. Since the change is for better, it
should not matter.
- nlmclnt_release_host(): decrement in refcount_dec_and_test() only
provides RELEASE ordering and control dependency on success
vs. fully ordered atomic counterpart. It doesn't seem to
matter in this case since object freeing happens under mutex
lock anyway.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
nlm_complain_hosts() walks through nlm_server_hosts hlist, which should
be protected by nlm_host_mutex.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
lockd_inet[6]addr_event use nlmsvc_rqst without taken nlmsvc_mutex,
nlmsvc_rqst can be changed during execution of notifiers and crash the host.
Patch enables access to nlmsvc_rqst only when it was correctly initialized
and delays its cleanup until notifiers are no longer in use.
Note that nlmsvc_rqst can be temporally set to ERR_PTR, so the "if
(nlmsvc_rqst)" check in notifiers is insufficient on its own.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>