OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Trond Myklebust	078000d02d	pNFS: We want return-on-close to complete when evicting the inode If the inode is being evicted, it should be safe to run return-on-close, so we should do it to ensure we don't inadvertently leak layout segments. Fixes: `1c5bd76d17` ("pNFS: Enable layoutreturn operation for return-on-close") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-01-10 13:32:51 -05:00
Trond Myklebust	67bbceedc9	pNFS: Mark layout for return if return-on-close was not sent If the layout return-on-close failed because the layoutreturn was never sent, then we should mark the layout for return again. Fixes: `9c47b18cf7` ("pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-01-10 13:32:51 -05:00
Scott Mayhew	c98e9daa59	NFS: Adjust fs_context error logging Several existing dprink()/dfprintk() calls were converted to use the new mount API logging macros by commit `ce8866f091` ("NFS: Attach supplementary error information to fs_context"). If the fs_context was not created using fsopen() then it will not have had a log buffer allocated for it, and the new mount API logging macros will wind up calling printk(). This can result in syslog messages being logged where previously there were none... most notably "NFS4: Couldn't follow remote path", which can happen if the client is auto-negotiating a protocol version with an NFS server that doesn't support the higher v4.x versions. Convert the nfs_errorf(), nfs_invalf(), and nfs_warnf() macros to check for the existence of the fs_context's log buffer and call dprintk() if it doesn't exist. Add nfs_ferrorf(), nfs_finvalf(), and nfs_warnf(), which do the same thing but take an NFS debug flag as an argument and call dfprintk(). Finally, modify the "NFS4: Couldn't follow remote path" message to use nfs_ferrorf(). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207385 Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Fixes: `ce8866f091` ("NFS: Attach supplementary error information to fs_context.") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-01-10 13:32:39 -05:00
Dave Wysochanski	3d1a90ab0e	NFS4: Fix use-after-free in trace_event_raw_event_nfs4_set_lock It is only safe to call the tracepoint before rpc_put_task() because 'data' is freed inside nfs4_lock_release (rpc_release). Fixes: `48c9579a1a` ("Adding stateid information to tracepoints") Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-01-06 12:34:57 -05:00
Linus Torvalds	74f602dc96	NFS client updates for Linux 5.11 Highlights include: Features: - NFSv3: Add emulation of lookupp() to improve open_by_filehandle() support. - A series of patches to improve readdir performance, particularly with large directories. - Basic support for using NFS/RDMA with the pNFS files and flexfiles drivers. - Micro-optimisations for RDMA. - RDMA tracing improvements. Bugfixes: - Fix a long standing bug with xs_read_xdr_buf() when receiving partial pages (Dan Aloni). - Various fixes for getxattr and listxattr, when used over non-TCP transports. - Fixes for containerised NFS from Sargun Dhillon. - switch nfsiod to be an UNBOUND workqueue (Neil Brown). - READDIR should not ask for security label information if there is no LSM policy. (Olga Kornievskaia) - Avoid using interval-based rebinding with TCP in lockd (Calum Mackay). - A series of RPC and NFS layer fixes to support the NFSv4.2 READ_PLUS code. - A couple of fixes for pnfs/flexfiles read failover Cleanups: - Various cleanups for the SUNRPC xdr code in conjunction with the READ_PLUS fixes. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl/aiaIACgkQZwvnipYK APIOihAAvONscxrFSaGRh2ICNv9I/zXW/A5+R3qnkESPVLTqTPJVphoN7FlINAr1 B74pg6n4T4viycbvsogU2+kHrlJZO7B8lTkJL7ynm9Wgyw8+2Ga4QEn1bsAoqmuY b91p/+LfOLKrYeeojoH31PC73uOYYG1WHXJhjq0l9b5CTgThWpj6O3gDaFEbFvmz A7V3yqSp04sV70YxUhwelBHZ5BXdiXIKsPnIwvXXHuY7IcamrE4EA3wGCwtxkBnu 4dwbOtRXURNSev0r3n6FsH4wZl+/nvp9UpnGdPtVv94F1zm2JKLwkhoJejS/vpjq eyKc7ZXBQ0uHbTWI2Yj1YjA61VIUO0R0EDuyTAnRKDeaarID42n5kMG7J8cIglZR jQfyx99xm0eSrdwxC09tcRL/lBzYcOfc6pJo5P9BtaFtRvbp9iFIHuFKlrXbULd4 WrZzDMhiKVYGSTcTpfQyVoK2rCvn6W1Ida4iYeI0gkJ1v9X90UhbtJOyggn/bxyL DV/Qy40+l48n7CZfPU2eDv4WXqjKGRibpDoWMBLwUH20dDEX6kKYv3BfApFYGqyO /GTPAFUZarCy8BENvzZv/Jb9mt5pDQM5p9ZXpdUOhydLMMA+pauaT/Gr+pAHPIPx MPj546Gh2cEaT883xvRrJmQTG0nw/WscPNcHaJcgL5oYltmuwck= =IKWG -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - NFSv3: Add emulation of lookupp() to improve open_by_filehandle() support - A series of patches to improve readdir performance, particularly with large directories - Basic support for using NFS/RDMA with the pNFS files and flexfiles drivers - Micro-optimisations for RDMA - RDMA tracing improvements Bugfixes: - Fix a long standing bug with xs_read_xdr_buf() when receiving partial pages (Dan Aloni) - Various fixes for getxattr and listxattr, when used over non-TCP transports - Fixes for containerised NFS from Sargun Dhillon - switch nfsiod to be an UNBOUND workqueue (Neil Brown) - READDIR should not ask for security label information if there is no LSM policy (Olga Kornievskaia) - Avoid using interval-based rebinding with TCP in lockd (Calum Mackay) - A series of RPC and NFS layer fixes to support the NFSv4.2 READ_PLUS code - A couple of fixes for pnfs/flexfiles read failover Cleanups: - Various cleanups for the SUNRPC xdr code in conjunction with the READ_PLUS fixes" * tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (90 commits) NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read() pNFS/flexfiles: Avoid spurious layout returns in ff_layout_choose_ds_for_read NFSv4/pnfs: Add tracing for the deviceid cache fs/lockd: convert comma to semicolon NFSv4.2: fix error return on memory allocation failure NFSv4.2/pnfs: Don't use READ_PLUS with pNFS yet NFSv4.2: Deal with potential READ_PLUS data extent buffer overflow NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow NFSv4.2: Handle hole lengths that exceed the READ_PLUS read buffer NFSv4.2: decode_read_plus_hole() needs to check the extent offset NFSv4.2: decode_read_plus_data() must skip padding after data segment NFSv4.2: Ensure we always reset the result->count in decode_read_plus() SUNRPC: When expanding the buffer, we may need grow the sparse pages SUNRPC: Cleanup - constify a number of xdr_buf helpers SUNRPC: Clean up open coded setting of the xdr_stream 'nwords' field SUNRPC: _copy_to/from_pages() now check for zero length SUNRPC: Cleanup xdr_shrink_bufhead() SUNRPC: Fix xdr_expand_hole() SUNRPC: Fixes for xdr_align_data() SUNRPC: _shift_data_left/right_pages should check the shift length ...	2020-12-17 12:15:03 -08:00
Trond Myklebust	52104f274e	NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read() Don't bump the index twice. Fixes: `563c53e73b` ("NFS: Fix flexfiles read failover") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-16 17:25:24 -05:00
Trond Myklebust	9bfffea352	pNFS/flexfiles: Avoid spurious layout returns in ff_layout_choose_ds_for_read The callers of ff_layout_choose_ds_for_read() should decide whether or not they want to return the layout on error. Sometimes, we may just want to retry from the beginning. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-16 17:25:24 -05:00
Trond Myklebust	cac1d3a2b8	NFSv4/pnfs: Add tracing for the deviceid cache Add tracepoints to allow debugging of the deviceid cache. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-16 17:25:24 -05:00
Colin Ian King	7be9b38afa	NFSv4.2: fix error return on memory allocation failure Currently when an alloc_page fails the error return is not set in variable err and a garbage initialized value is returned. Fix this by setting err to -ENOMEM before taking the error return path. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: `a1f26739cc` ("NFSv4.2: improve page handling for GETXATTR") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-16 07:54:42 -05:00
Linus Torvalds	f986e35083	Merge branch 'akpm' (patches from Andrew) Merge yet more updates from Andrew Morton: - lots of little subsystems - a few post-linux-next MM material. Most of the rest awaits more merging of other trees. Subsystems affected by this series: alpha, procfs, misc, core-kernel, bitmap, lib, lz4, checkpatch, nilfs, kdump, rapidio, gcov, bfs, relay, resource, ubsan, reboot, fault-injection, lzo, apparmor, and mm (swap, memory-hotplug, pagemap, cleanups, and gup). * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (86 commits) mm: fix some spelling mistakes in comments mm: simplify follow_pte{,pmd} mm: unexport follow_pte_pmd apparmor: remove duplicate macro list_entry_is_head() lib/lzo/lzo1x_compress.c: make lzogeneric1x_1_compress() static fault-injection: handle EI_ETYPE_TRUE reboot: hide from sysfs not applicable settings reboot: allow to override reboot type if quirks are found reboot: remove cf9_safe from allowed types and rename cf9_force reboot: allow to specify reboot mode via sysfs reboot: refactor and comment the cpu selection code lib/ubsan.c: mark type_check_kinds with static keyword kcov: don't instrument with UBSAN ubsan: expand tests and reporting ubsan: remove UBSAN_MISC in favor of individual options ubsan: enable for allconfig builds ubsan: disable UBSAN_TRAP for allconfig ubsan: disable object-size sanitizer under GCC ubsan: move cc-option tests into Kconfig ubsan: remove redundant -Wno-maybe-uninitialized ...	2020-12-15 23:26:37 -08:00
Andy Shevchenko	aa6159ab99	kernel.h: split out mathematical helpers kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt to start cleaning it up by splitting out mathematical helpers. At the same time convert users in header and lib folder to use new header. Though for time being include new header back to kernel.h to avoid twisted indirected includes for existing users. [sfr@canb.auug.org.au: fix powerpc build] Link: https://lkml.kernel.org/r/20201029150809.13059608@canb.auug.org.au Link: https://lkml.kernel.org/r/20201028173212.41768-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-12-15 22:46:15 -08:00
Linus Torvalds	1a50ede2b3	Highlights: - Improve support for re-exporting NFS mounts - Replace NFSv4 XDR decoding C macros with xdr_stream helpers - Support for multiple RPC/RDMA chunks per RPC transaction -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAl/Q4dIACgkQM2qzM29m f5fInw//eDrmXBEhxbzcgeqNilGU5Qkn4INJtAcOGwPcw5Kjp4UVNGFpZNPqIDSf FP0Yw0d/rW7UggwCviPcs/adLTasU9skq1jgAv8d0ig4DtPbeqFo6BvbY+G2JxVF EfTeHzr6w6er8HRqyuLN4hjm1rQIpQlDHaYU4QcMs4fjPVv88eYLiwnYGYf3X46i vBYstu1IRxHhg2x4O833xmiL6VbkZDQoWwDjGICylxUBcNUtAmq/sETjTa4JVEJj 4vgXdcJmAFjNgAOrmoR3DISsr9mvCvKN9g3C0+hHiRERTGEon//HzvscWH74wT48 o0LUW0ZWgpmunTcmiSNeeiHNsUXJyy3A/xyEdteqqnvSxulxlqkQzb15Eb+92+6n BHGT/sOz1zz+/l9NCpdeEl5AkSA9plV8Iqd/kzwFwe1KwHMjldeMw/mhMut8EM2j b54EMsp40ipITAwBHvcygCXiWAn/mPex6bCr17Dijo6MsNLsyd+cDsazntbNzwz3 RMGMf2TPOi8tWswrTUS9J5xKk5LAEWX/6Z/hTA1YlsB3PfrhXO97ztrytxvoO/bp M0NREA+NNMn/JyyL8FT3ID5peaLVHhA1GHw9CcUw3C7OVzmsEg29D4zNo02dF1TC LIyekp0kbSGGY1jLOeMLsa6Jr+2+40CcctsooVkRA+3rN0tJQvw= =1uP3 -----END PGP SIGNATURE----- Merge tag 'nfsd-5.11' of git://git.linux-nfs.org/projects/cel/cel-2.6 Pull nfsd updates from Chuck Lever: "Several substantial changes this time around: - Previously, exporting an NFS mount via NFSD was considered to be an unsupported feature. With v5.11, the community has attempted to make re-exporting a first-class feature of NFSD. This would enable the Linux in-kernel NFS server to be used as an intermediate cache for a remotely-located primary NFS server, for example, even with other NFS server implementations, like a NetApp filer, as the primary. - A short series of patches brings support for multiple RPC/RDMA data chunks per RPC transaction to the Linux NFS server's RPC/RDMA transport implementation. This is a part of the RPC/RDMA spec that the other premiere NFS/RDMA implementation (Solaris) has had for a very long time, and completes the implementation of RPC/RDMA version 1 in the Linux kernel's NFS server. - Long ago, NFSv4 support was introduced to NFSD using a series of C macros that hid dprintk's and goto's. Over time, the kernel's XDR implementation has been greatly improved, but these C macros have remained and become fallow. A series of patches in this pull request completely replaces those macros with the use of current kernel XDR infrastructure. Benefits include: - More robust input sanitization in NFSD's NFSv4 XDR decoders. - Make it easier to use common kernel library functions that use XDR stream APIs (for example, GSS-API). - Align the structure of the source code with the RFCs so it is easier to learn, verify, and maintain our XDR implementation. - Removal of more than a hundred hidden dprintk() call sites. - Removal of some explicit manipulation of pages to help make the eventual transition to xdr->bvec smoother. - On top of several related fixes in 5.10-rc, there are a few more fixes to get the Linux NFSD implementation of NFSv4.2 inter-server copy up to speed. And as usual, there is a pinch of seasoning in the form of a collection of unrelated minor bug fixes and clean-ups. Many thanks to all who contributed this time around!" * tag 'nfsd-5.11' of git://git.linux-nfs.org/projects/cel/cel-2.6: (131 commits) nfsd: Record NFSv4 pre/post-op attributes as non-atomic nfsd: Set PF_LOCAL_THROTTLE on local filesystems only nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE exportfs: Add a function to return the raw output from fh_to_dentry() nfsd: close cached files prior to a REMOVE or RENAME that would replace target nfsd: allow filesystems to opt out of subtree checking nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations Revert "nfsd4: support change_attr_type attribute" nfsd4: don't query change attribute in v2/v3 case nfsd: minor nfsd4_change_attribute cleanup nfsd: simplify nfsd4_change_info nfsd: only call inode_query_iversion in the I_VERSION case nfs_common: need lock during iterate through the list NFSD: Fix 5 seconds delay when doing inter server copy NFSD: Fix sparse warning in nfs4proc.c SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcall sunrpc: clean-up cache downcall nfsd: Fix message level for normal termination NFSD: Remove macros that are no longer used NFSD: Replace READ* macros in nfsd4_decode_compound() ...	2020-12-15 18:52:30 -08:00
Trond Myklebust	5c3485bb12	NFSv4.2/pnfs: Don't use READ_PLUS with pNFS yet We have no way of tracking server READ_PLUS support in pNFS for now, so just disable it. Reported-by: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:08 -05:00
Trond Myklebust	7aedc687c9	NFSv4.2: Deal with potential READ_PLUS data extent buffer overflow If the server returns more data than we have buffer space for, then we need to truncate and exit early. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:08 -05:00
Trond Myklebust	503b934a75	NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow Expanding the READ_PLUS extents can cause the read buffer to overflow. If it does, then don't error, but just exit early. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:08 -05:00
Trond Myklebust	dac3b1059b	NFSv4.2: Handle hole lengths that exceed the READ_PLUS read buffer If a hole extends beyond the READ_PLUS read buffer, then we want to fill just the remaining buffer with zeros. Also ignore eof... Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:08 -05:00
Trond Myklebust	82f98c8b11	NFSv4.2: decode_read_plus_hole() needs to check the extent offset The server is allowed to return a hole extent with an offset that starts before the offset supplied in the READ_PLUS argument. Ensure that we support that case too. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:07 -05:00
Trond Myklebust	5c4afe2ab6	NFSv4.2: decode_read_plus_data() must skip padding after data segment All XDR opaque object sizes are 32-bit aligned, and a data segment is no exception. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:07 -05:00
Trond Myklebust	1ee6310119	NFSv4.2: Ensure we always reset the result->count in decode_read_plus() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:07 -05:00
Geliang Tang	1f70ea7009	NFSv4.1: use BITS_PER_LONG macro in nfs4session.h Use the existing BITS_PER_LONG macro instead of calculating the value. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:07 -05:00
Frank van der Linden	a1f26739cc	NFSv4.2: improve page handling for GETXATTR XDRBUF_SPARSE_PAGES can cause problems for the RDMA transport, and it's easy enough to allocate enough pages for the request up front, so do that. Also, since we've allocated the pages anyway, use the full page aligned length for the receive buffer. This will allow caching of valid replies that are too large for the caller, but that still fit in the allocated pages. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:07 -05:00
Anna Schumaker	21e31401fc	NFS: Disable READ_PLUS by default We've been seeing failures with xfstests generic/091 and generic/263 when using READ_PLUS. I've made some progress on these issues, and the tests fail later on but still don't pass. Let's disable READ_PLUS by default until we can work out what is going on. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-12-10 16:48:03 -05:00
Dai Ngo	fe8eb820e3	NFSv4.2: Fix 5 seconds delay when doing inter server copy Since commit `b4868b44c5` ("NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE"), every inter server copy operation suffers 5 seconds delay regardless of the size of the copy. The delay is from nfs_set_open_stateid_locked when the check by nfs_stateid_is_sequential fails because the seqid in both nfs4_state and nfs4_stateid are 0. Fix __nfs42_ssc_open to delay setting of NFS_OPEN_STATE in nfs4_state, until after the call to update_open_stateid, to indicate this is the 1st open. This fix is part of a 2 patches, the other patch is the fix in the source server to return the stateid for COPY_NOTIFY request with seqid 1 instead of 0. Fixes: `ce0887ac96` ("NFSD add nfs4 inter ssc to nfsd4_copy") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-12-10 16:48:03 -05:00
Chuck Lever	1c87b85162	NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation By switching to an XFS-backed export, I am able to reproduce the ibcomp worker crash on my client with xfstests generic/013. For the failing LISTXATTRS operation, xdr_inline_pages() is called with page_len=12 and buflen=128. - When ->send_request() is called, rpcrdma_marshal_req() does not set up a Reply chunk because buflen is smaller than the inline threshold. Thus rpcrdma_convert_iovs() does not get invoked at all and the transport's XDRBUF_SPARSE_PAGES logic is not invoked on the receive buffer. - During reply processing, rpcrdma_inline_fixup() tries to copy received data into rq_rcv_buf->pages because page_len is positive. But there are no receive pages because rpcrdma_marshal_req() never allocated them. The result is that the ibcomp worker faults and dies. Sometimes that causes a visible crash, and sometimes it results in a transport hang without other symptoms. RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and should eventually be fixed or replaced. However, my preference is that upper-layer operations should explicitly allocate their receive buffers (using GFP_KERNEL) when possible, rather than relying on XDRBUF_SPARSE_PAGES. Reported-by: Olga kornievskaia <kolga@netapp.com> Suggested-by: Olga kornievskaia <kolga@netapp.com> Fixes: `c10a75145f` ("NFSv4.2: add the extended attribute proc functions.") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Olga kornievskaia <kolga@netapp.com> Reviewed-by: Frank van der Linden <fllinden@amazon.com> Tested-by: Olga kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-12-10 16:48:03 -05:00
Trond Myklebust	fa94a951bf	NFSv4.2: Fix up the get/listxattr calls to rpc_prepare_reply_pages() Ensure that both getxattr and listxattr page array are correctly aligned, and that getxattr correctly accounts for the page padding word. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-10 09:01:53 -05:00
Trond Myklebust	716a8bc7f7	nfsd: Record NFSv4 pre/post-op attributes as non-atomic For the case of NFSv4, specify to the client that the pre/post-op attributes were not recorded atomically with the main operation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:38 -05:00
Trond Myklebust	01cbf38539	nfsd: Set PF_LOCAL_THROTTLE on local filesystems only Don't set PF_LOCAL_THROTTLE on remote filesystems like NFS, since they aren't expected to ever be subject to double buffering. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:38 -05:00
Jeff Layton	7f84b488f9	nfsd: close cached files prior to a REMOVE or RENAME that would replace target It's not uncommon for some workloads to do a bunch of I/O to a file and delete it just afterward. If knfsd has a cached open file however, then the file may still be open when the dentry is unlinked. If the underlying filesystem is nfs, then that could trigger it to do a sillyrename. On a REMOVE or RENAME scan the nfsd_file cache for open files that correspond to the inode, and proactively unhash and put their references. This should prevent any delete-on-last-close activity from occurring, solely due to knfsd's open file cache. This must be done synchronously though so we use the variants that call flush_delayed_fput. There are deadlock possibilities if you call flush_delayed_fput while holding locks, however. In the case of nfsd_rename, we don't even do the lookups of the dentries to be renamed until we've locked for rename. Once we've figured out what the target dentry is for a rename, check to see whether there are cached open files associated with it. If there are, then unwind all of the locking, close them all, and then reattempt the rename. None of this is really necessary for "typical" filesystems though. It's mostly of use for NFS, so declare a new export op flag and use that to determine whether to close the files beforehand. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:38 -05:00
Jeff Layton	ba5e8187c5	nfsd: allow filesystems to opt out of subtree checking When we start allowing NFS to be reexported, then we have some problems when it comes to subtree checking. In principle, we could allow it, but it would mean encoding parent info in the filehandles and there may not be enough space for that in a NFSv3 filehandle. To enforce this at export upcall time, we add a new export_ops flag that declares the filesystem ineligible for subtree checking. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:38 -05:00
Jeff Layton	daab110e47	nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations With NFSv3 nfsd will always attempt to send along WCC data to the client. This generally involves saving off the in-core inode information prior to doing the operation on the given filehandle, and then issuing a vfs_getattr to it after the op. Some filesystems (particularly clustered or networked ones) have an expensive ->getattr inode operation. Atomicity is also often difficult or impossible to guarantee on such filesystems. For those, we're best off not trying to provide WCC information to the client at all, and to simply allow it to poll for that information as needed with a GETATTR RPC. This patch adds a new flags field to struct export_operations, and defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate that nfsd should not attempt to provide WCC info in NFSv3 replies. It also adds a blurb about the new flags field and flag to the exporting documentation. The server will also now skip collecting this information for NFSv2 as well, since that info is never used there anyway. Note that this patch does not add this flag to any filesystem export_operations structures. This was originally developed to allow reexporting nfs via nfsd. Other filesystems may want to consider enabling this flag too. It's hard to tell however which ones have export operations to enable export via knfsd and which ones mostly rely on them for open-by-filehandle support, so I'm leaving that up to the individual maintainers to decide. I am cc'ing the relevant lists for those filesystems that I think may want to consider adding this though. Cc: HPDD-discuss@lists.01.org Cc: ceph-devel@vger.kernel.org Cc: cluster-devel@redhat.com Cc: fuse-devel@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:38 -05:00
NeilBrown	bf701b765e	NFS: switch nfsiod to be an UNBOUND workqueue. nfsiod is currently a concurrency-managed workqueue (CMWQ). This means that workitems scheduled to nfsiod on a given CPU are queued behind all other work items queued on any CMWQ on the same CPU. This can introduce unexpected latency. Occaionally nfsiod can even cause excessive latency. If the work item to complete a CLOSE request calls the final iput() on an inode, the address_space of that inode will be dismantled. This takes time proportional to the number of in-memory pages, which on a large host working on large files (e.g.. 5TB), can be a large number of pages resulting in a noticable number of seconds. We can avoid these latency problems by switching nfsiod to WQ_UNBOUND. This causes each concurrent work item to gets a dedicated thread which can be scheduled to an idle CPU. There is precedent for this as several other filesystems use WQ_UNBOUND workqueue for handling various async events. Signed-off-by: NeilBrown <neilb@suse.de> Fixes: `ada609ee2a` ("workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:54 -05:00
Sargun Dhillon	d3ff46fe69	NFSv4: Refactor to use user namespaces for nfs4idmap In several patches work has been done to enable NFSv4 to use user namespaces: 58002399da65: NFSv4: Convert the NFS client idmapper to use the container user namespace 3b7eb5e35d0f: NFS: When mounting, don't share filesystems between different user namespaces Unfortunately, the userspace APIs were only such that the userspace facing side of the filesystem (superblock s_user_ns) could be set to a non init user namespace. This furthers the fs_context related refactoring, and piggybacks on top of that logic, so the superblock user namespace, and the NFS user namespace are the same. Users can still use rpc.idmapd if they choose to, but there are complexities with user namespaces and request-key that have yet to be addresssed. Eventually, we will need to at least: * Come up with an upcall mechanism that can be triggered inside of the container, or safely triggered outside, with the requisite context to do the right mapping. * Handle whatever refactoring needs to be done in net/sunrpc. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Tested-by: Alban Crequy <alban.crequy@gmail.com> Fixes: `62a55d088c` ("NFS: Additional refactoring for fs_context conversion") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:54 -05:00
Sargun Dhillon	d18a9d3fa0	NFS: NFSv2/NFSv3: Use cred from fs_context during mount There was refactoring done to use the fs_context for mounting done in: 62a55d088cd87: NFS: Additional refactoring for fs_context conversion This made it so that the net_ns is fetched from the fs_context (the netns that fsopen is called in). This change also makes it so that the credential fetched during fsopen is used as well as the net_ns. NFS has already had a number of changes to prepare it for user namespaces: 1a58e8a0e5c1: NFS: Store the credential of the mount process in the nfs_server 264d948ce7d0: NFS: Convert NFSv3 to use the container user namespace c207db2f5da5: NFS: Convert NFSv2 to use the container user namespace Previously, different credentials could be used for creation of the fs_context versus creation of the nfs_server, as FSCONFIG_CMD_CREATE did the actual credential check, and that's where current_creds() were fetched. This meant that the user namespace which fsopen was called in could be a non-init user namespace. This still requires that the user that calls FSCONFIG_CMD_CREATE has CAP_SYS_ADMIN in the init user ns. This roughly allows a privileged user to mount on behalf of an unprivileged usernamespace, by forking off and calling fsopen in the unprivileged user namespace. It can then pass back that fsfd to the privileged process which can configure the NFS mount, and then it can call FSCONFIG_CMD_CREATE before switching back into the mount namespace of the container, and finish up the mounting process and call fsmount and move_mount. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Tested-by: Alban Crequy <alban.crequy@gmail.com> Fixes: `62a55d088c` ("NFS: Additional refactoring for fs_context conversion") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:54 -05:00
Trond Myklebust	b6d49ecd10	NFSv4: Fix a pNFS layout related use-after-free race when freeing the inode When returning the layout in nfs4_evict_inode(), we need to ensure that the layout is actually done being freed before we can proceed to free the inode itself. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:54 -05:00
Trond Myklebust	17068466ad	NFSv4: Fix open coded xdr_stream_remaining() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:54 -05:00
Trond Myklebust	9ed5af268e	SUNRPC: Clean up the handling of page padding in rpc_prepare_reply_pages() rpc_prepare_reply_pages() currently expects the 'hdrsize' argument to contain the length of the data that we expect to want placed in the head kvec plus a count of 1 word of padding that is placed after the page data. This is very confusing when trying to read the code, and sometimes leads to callers adding an arbitrary value of '1' just in order to satisfy the requirement (whether or not the page data actually needs such padding). This patch aims to clarify the code by changing the 'hdrsize' argument to remove that 1 word of padding. This means we need to subtract the padding from all the existing callers. Fixes: `02ef04e432` ("NFS: Account for XDR pad of buf->pages") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	046e5ccb41	NFSv4: Fix the alignment of page data in the getdeviceinfo reply We can fit the device_addr4 opaque data padding in the pages. Fixes: `cf500bac8f` ("SUNRPC: Introduce rpc_prepare_reply_pages()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	9889981349	pNFS: Clean up open coded xdr string decoding Use the existing xdr_stream_decode_string_dup() to safely decode into kmalloced strings. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	9a7016319e	pNFS/flexfiles: Fix up layoutstats reporting for non-TCP transports Ensure that we report the correct netid when using UDP or RDMA transports to the DSes. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	4be78d2681	NFSv4/pNFS: Store the transport type in struct nfs4_pnfs_ds_addr We want to enable RDMA and UDP as valid transport methods if a GETDEVICEINFO call specifies it. Do so by adding a parser for the netid that translates it to an appropriate argument for the RPC transport layer. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	190c75a31f	pNFS: Add helpers for allocation/free of struct nfs4_pnfs_ds_addr Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	a12f996d34	NFSv4/pNFS: Use connections to a DS that are all of the same protocol family If the pNFS metadata server advertises multiple addresses for the same data server, we should try to connect to just one protocol family and transport type on the assumption that homogeneity will improve performance. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	1c3695d0bb	NFS: Switch mount code to use xprt_find_transport_ident() Switch the mount code to use xprt_find_transport_ident() and to check the results before allowing the mount to proceed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	794092c57f	NFS: Do uncached readdir when we're seeking a cookie in an empty page cache If the directory is changing, causing the page cache to get invalidated while we are listing the contents, then the NFS client is currently forced to read in the entire directory contents from scratch, because it needs to perform a linear search for the readdir cookie. While this is not an issue for small directories, it does not scale to directories with millions of entries. In order to be able to deal with large directories that are changing, add a heuristic to ensure that if the page cache is empty, and we are searching for a cookie that is not the zero cookie, we just default to performing uncached readdir. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	35df59d3ef	NFS: Reduce number of RPC calls when doing uncached readdir If we're doing uncached readdir, allocate multiple pages in order to try to avoid duplicate RPC calls for the same getdents() call. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	762567b7c7	NFS: Optimisations for monotonically increasing readdir cookies If the server is handing out monotonically increasing readdir cookie values, then we can optimise away searches through pages that contain cookies that lie outside our search range. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	b593c09f83	NFS: Improve handling of directory verifiers If the server insists on using the readdir verifiers in order to allow cookies to expire, then we should ensure that we cache the verifier with the cookie, so that we can return an error if the application tries to use the expired cookie. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	9fff59ed4c	NFS: Handle NFS4ERR_NOT_SAME and NFSERR_BADCOOKIE from readdir calls If the server returns NFS4ERR_NOT_SAME or tells us that the cookie is bad in response to a READDIR call, then we should empty the page cache so that we can fill it from scratch again. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	82e22a5e62	NFS: Allow the NFS generic code to pass in a verifier to readdir If we're ever going to allow support for servers that use the readdir verifier, then that use needs to be managed by the middle layers as those need to be able to reject cookies from other verifiers. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	6c981eff23	NFS: Cleanup to remove nfs_readdir_descriptor_t typedef Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	6b75cf9e30	NFS: Reduce readdir stack usage The descriptor and the struct nfs_entry are both large structures, so don't allocate them from the stack. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	dbeaf8c984	NFS: nfs_do_filldir() does not return a value Clean up nfs_do_filldir(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	93b8959a0a	NFS: More readdir cleanups Remove the redundant caching of the credential in struct nfs_open_dir_context. Pass the buffer size as an argument to nfs_readdir_xdr_filler(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	1a34c8c9a4	NFS: Support larger readdir buffers Support readdir buffers of up to 1MB in size so that we can read large directories using few RPC calls. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	a52a8a6ada	NFS: Simplify struct nfs_cache_array_entry We don't need to store a hash, so replace struct qstr with a simple const char pointer and length. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:51 -05:00
Trond Myklebust	ed09222d65	NFS: Replace kmap() with kmap_atomic() in nfs_readdir_search_array() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:51 -05:00
Trond Myklebust	e762a63981	NFS: Remove unnecessary kmap in nfs_readdir_xdr_to_array() The kmapped pointer is only used once per loop to check if we need to exit. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:51 -05:00
Trond Myklebust	3b2a09f127	NFS: Don't discard readdir results If a readdir call returns more data than we can fit into one page cache page, then allocate a new one for that data rather than discarding the data. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:51 -05:00
Trond Myklebust	1f1d4aa4e4	NFS: Clean up directory array handling Refactor to use pagecache_get_page() so that we can fill the page in multiple stages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:51 -05:00
Trond Myklebust	972bcdf233	NFS: Clean up nfs_readdir_page_filler() Clean up handling of the case where there are no entries in the readdir reply. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:51 -05:00
Trond Myklebust	b1e21c9743	NFS: Clean up readdir struct nfs_cache_array Since the 'eof_index' is only ever used as a flag, make it so. Also add a flag to detect if the page has been completely filled. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:51 -05:00
Trond Myklebust	2e7a464179	NFS: Ensure contents of struct nfs_open_dir_context are consistent Ensure that the contents of struct nfs_open_dir_context are consistent by setting them under the file->f_lock from a private copy (that is known to be consistent). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com>	2020-12-02 14:05:51 -05:00
Olga Kornievskaia	05ad917561	NFSv4.2: condition READDIR's mask for security label based on LSM state Currently, the client will always ask for security_labels if the server returns that it supports that feature regardless of any LSM modules (such as Selinux) enforcing security policy. This adds performance penalty to the READDIR operation. Client adjusts superblock's support of the security_label based on the server's support but also current client's configuration of the LSM modules. Thus, prior to using the default bitmask in READDIR, this patch checks the server's capabilities and then instructs READDIR to remove FATTR4_WORD2_SECURITY_LABEL from the bitmask. v5: fixing silly mistakes of the rushed v4 v4: simplifying logic v3: changing label's initialization per Ondrej's comment v2: dropping selinux hook and using the sb cap. Suggested-by: Ondrej Mosnacek <omosnace@redhat.com> Suggested-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Fixes: `2b0143b5c9` ("VFS: normal filesystems (and lustre): d_inode() annotations") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:51 -05:00
Trond Myklebust	76998ebb91	NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp We need to respect the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp, by timing out if the server is unavailable. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:51 -05:00
Trond Myklebust	3c5e9a59fa	NFSv3: Add emulation of the lookupp() operation In order to use the open_by_filehandle() operations on NFSv3, we need to be able to emulate lookupp() so that nfs_get_parent() can be used to convert disconnected dentries into connected ones. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:51 -05:00
Trond Myklebust	5f447cb881	NFSv3: Refactor nfs3_proc_lookup() to split out the dentry We want to reuse the lookup code in NFSv3 in order to emulate the NFSv4 lookupp operation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:51 -05:00
Dai Ngo	bd75475c2f	NFSv4.2: Fix 5 seconds delay when doing inter server copy Since commit `b4868b44c5` ("NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE"), every inter server copy operation suffers 5 seconds delay regardless of the size of the copy. The delay is from nfs_set_open_stateid_locked when the check by nfs_stateid_is_sequential fails because the seqid in both nfs4_state and nfs4_stateid are 0. Fix __nfs42_ssc_open to delay setting of NFS_OPEN_STATE in nfs4_state, until after the call to update_open_stateid, to indicate this is the 1st open. This fix is part of a 2 patches, the other patch is the fix in the source server to return the stateid for COPY_NOTIFY request with seqid 1 instead of 0. Fixes: `ce0887ac96` ("NFSD add nfs4 inter ssc to nfsd4_copy") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-12-02 13:45:33 -05:00
Chuck Lever	5482e09a88	NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation By switching to an XFS-backed export, I am able to reproduce the ibcomp worker crash on my client with xfstests generic/013. For the failing LISTXATTRS operation, xdr_inline_pages() is called with page_len=12 and buflen=128. - When ->send_request() is called, rpcrdma_marshal_req() does not set up a Reply chunk because buflen is smaller than the inline threshold. Thus rpcrdma_convert_iovs() does not get invoked at all and the transport's XDRBUF_SPARSE_PAGES logic is not invoked on the receive buffer. - During reply processing, rpcrdma_inline_fixup() tries to copy received data into rq_rcv_buf->pages because page_len is positive. But there are no receive pages because rpcrdma_marshal_req() never allocated them. The result is that the ibcomp worker faults and dies. Sometimes that causes a visible crash, and sometimes it results in a transport hang without other symptoms. RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and should eventually be fixed or replaced. However, my preference is that upper-layer operations should explicitly allocate their receive buffers (using GFP_KERNEL) when possible, rather than relying on XDRBUF_SPARSE_PAGES. Reported-by: Olga kornievskaia <kolga@netapp.com> Suggested-by: Olga kornievskaia <kolga@netapp.com> Fixes: `c10a75145f` ("NFSv4.2: add the extended attribute proc functions.") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Olga kornievskaia <kolga@netapp.com> Reviewed-by: Frank van der Linden <fllinden@amazon.com> Tested-by: Olga kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-12-02 13:19:20 -05:00
Chuck Lever	0ae4c3e8a6	SUNRPC: Add xdr_set_scratch_page() and xdr_reset_scratch_buffer() Clean up: De-duplicate some frequently-used code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 14:46:35 -05:00
Trond Myklebust	63e2fffa59	pNFS/flexfiles: Fix array overflow when flexfiles mirroring is enabled If the flexfiles mirroring is enabled, then the read code expects to be able to set pgio->pg_mirror_idx to point to the data server that is being used for this particular read. However it does not change the pg_mirror_count because we only need to send a single read. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-11-30 10:52:22 -05:00
Trond Myklebust	11decaf812	NFS: Remove unnecessary inode lock in nfs_fsync_dir() nfs_inc_stats() is already thread-safe, and there are no other reasons to hold the inode lock here. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-11-12 10:41:26 -05:00
Trond Myklebust	83f2c45e63	NFS: Remove unnecessary inode locking in nfs_llseek_dir() Remove the contentious inode lock, and instead provide thread safety using the file->f_lock spinlock. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-11-12 10:41:26 -05:00
Chuck Lever	6c2190b3fc	NFS: Fix listxattr receive buffer size Certain NFSv4.2/RDMA tests fail with v5.9-rc1. rpcrdma_convert_kvec() runs off the end of the rl_segments array because rq_rcv_buf.tail[0].iov_len holds a very large positive value. The resultant kernel memory corruption is enough to crash the client system. Callers of rpc_prepare_reply_pages() must reserve an extra XDR_UNIT in the maximum decode size for a possible XDR pad of the contents of the xdr_buf's pages. That guarantees the allocated receive buffer will be large enough to accommodate the usual contents plus that XDR pad word. encode_op_hdr() cannot add that extra word. If it does, xdr_inline_pages() underruns the length of the tail iovec. Fixes: `3e1f02123f` ("NFSv4.2: add client side XDR handling for extended attributes") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-11-12 10:41:26 -05:00
J. Bruce Fields	70438afbf1	NFSv4.2: fix failure to unregister shrinker We forgot to unregister the nfs4_xattr_large_entry_shrinker. That leaves the global list of shrinkers corrupted after unload of the nfs module, after which possibly unrelated code that calls register_shrinker() or unregister_shrinker() gets a BUG() with "supervisor write access in kernel mode". And similarly for the nfs4_xattr_large_entry_lru. Reported-by: Kris Karas <bugs-a17@moonlit-rail.com> Tested-By: Kris Karas <bugs-a17@moonlit-rail.com> Fixes: `95ad37f90c` "NFSv4.2: add client side xattr caching." Signed-off-by: J. Bruce Fields <bfields@redhat.com> CC: stable@vger.kernel.org Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-11-12 10:40:02 -05:00
Helge Deller	3fc2bfa365	nfsroot: Default mount option should ask for built-in NFS version Change the nfsroot default mount option to ask for NFSv2 only if the kernel was built with NFSv2 support. If not, default to NFSv3 or as last choice to NFSv4, depending on actual kernel config. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-11-02 10:29:03 -05:00
Linus Torvalds	24717cfbbb	The one new feature this time, from Anna Schumaker, is READ_PLUS, which has the same arguments as READ but allows the server to return an array of data and hole extents. Otherwise it's a lot of cleanup and bugfixes. -----BEGIN PGP SIGNATURE----- iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAl+Q5vsVHGJmaWVsZHNA ZmllbGRzZXMub3JnAAoJECebzXlCjuG+DUAP/RlALnXbaoWi8YCcEcc9U1LoQKbD CJpDR+FqCOyGwRuzWung/5pvkOO50fGEeAroos+2rF/NgRkQq8EFr9AuBhNOYUFE IZhWEOfu/r2ukXyBmcu21HGcWLwPnyJehvjuzTQW2wOHlBi/sdoL5Ap1sVlwVLj5 EZ5kqJLD+ioG2sufW99Spi55l1Cy+3Y0IhLSWl4ZAE6s8hmFSYAJZFsOeI0Afx57 USPTDRaeqjyEULkb+f8IhD0eRApOUo4evDn9dwQx+of7HPa1CiygctTKYwA3hnlc gXp2KpVA1REaiYVgOPwYlnqBmJ2K9X0wCRzcWy2razqEcVAX/2j7QCe9M2mn4DC8 xZ2q4SxgXu9yf0qfUSVnDxWmP6ipqq7OmsG0JXTFseGKBdpjJY1qHhyqanVAGvEg I+xHnnWfGwNCftwyA3mt3RfSFPsbLlSBIMZxvN4kn8aVlqszGITOQvTdQcLYA6kT xWllBf4XKVXMqF0PzerxPDmfzBfhx6b1VPWOIVcu7VLBg3IXoEB2G5xG8MUJiSch OUTCt41LUQkerQlnzaZYqwmFdSBfXJefmcE/x/vps4VtQ/fPHX1jQyD7iTu3HfSP bRlkKHvNVeTodlBDe/HTPiTA99MShhBJyvtV5wfzIqwjc1cNreed+ePppxn8mxJi SmQ2uZk/MpUl7/V0 =rcOj -----END PGP SIGNATURE----- Merge tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "The one new feature this time, from Anna Schumaker, is READ_PLUS, which has the same arguments as READ but allows the server to return an array of data and hole extents. Otherwise it's a lot of cleanup and bugfixes" * tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux: (43 commits) NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy SUNRPC: fix copying of multiple pages in gss_read_proxy_verf() sunrpc: raise kernel RPC channel buffer size svcrdma: fix bounce buffers for unaligned offsets and multiple pages nfsd: remove unneeded break net/sunrpc: Fix return value for sysctl sunrpc.transports NFSD: Encode a full READ_PLUS reply NFSD: Return both a hole and a data segment NFSD: Add READ_PLUS hole segment encoding NFSD: Add READ_PLUS data support NFSD: Hoist status code encoding into XDR encoder functions NFSD: Map nfserr_wrongsec outside of nfsd_dispatch NFSD: Remove the RETURN_STATUS() macro NFSD: Call NFSv2 encoders on error returns NFSD: Fix .pc_release method for NFSv2 NFSD: Remove vestigial typedefs NFSD: Refactor nfsd_dispatch() error paths NFSD: Clean up nfsd_dispatch() variables NFSD: Clean up stale comments in nfsd_dispatch() NFSD: Clean up switch statement in nfsd_dispatch() ...	2020-10-22 09:44:27 -07:00
Dai Ngo	0cfcd405e7	NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy NFS_FS=y as dependency of CONFIG_NFSD_V4_2_INTER_SSC still have build errors and some configs with NFSD=m to get NFS4ERR_STALE error when doing inter server copy. Added ops table in nfs_common for knfsd to access NFS client modules. Fixes: `3ac3711adb` ("NFSD: Fix NFS server build errors") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-10-21 10:31:20 -04:00
Linus Torvalds	59f0e7eb2f	NFS Client Updates for Linux 5.10 - Stable Fixes: - Wait for stateid updates after CLOSE/OPEN_DOWNGRADE # v5.4+ - Fix nfs_path in case of a rename retry - Support EXCHID4_FLAG_SUPP_FENCE_OPS v4.2 EXCHANGE_ID flag - New features and improvements: - Replace dprintk() calls with tracepoints - Make cache consistency bitmap dynamic - Added support for the NFS v4.2 READ_PLUS operation - Improvements to net namespace uniquifier - Other bugfixes and cleanups - Remove redundant clnt pointer - Don't update timeout values on connection resets - Remove redundant tracepoints - Various cleanups to comments - Fix oops when trying to use copy_file_range with v4.0 source server - Improvements to flexfiles mirrors - Add missing "local_lock=posix" mount option -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl+PJv0ACgkQ18tUv7Cl QOtB+hAAza4BwQSAZs0QbszPcoNV0f0qND99hWcJ2ad5KCr8S5eNYaMqB8G8lbS+ Ahuv6xRy69l2vjHvINXoSaSiXClYNAlAr5myiW0DqMLDpVV8VEMAhqgFJK97VpqQ AsbsdFwC4xAcQ+6nJ+IfK9f8AOYhvARkOP3171qkq0jX3Eq4j1IiQARn+JrGFZFL waC4UtVhCnx5z5pg7jyu7Ar4N0Xky72+Fm1EvRog4gndX2JbNNSkwavD33mWZ8iN 3q6DRq/AtEk9F4ttPwVeALvpNCBqjoGcNw5FCBil7x4V+xIbDI2FBhusKas3GzXm W2tyUuJMTGpA6ZjkyYDcg0jC8vKUiUpMWgT3oZoQ/6g7P4vPzB/XB/RQcAInMRjS /MhWZ3YNQmApnVCIZSwOa9+oNUc69dM0+vqmesLvvb/aNjzRnGwiJNiuWGyrwoGX 3SzpPUcJixa3+eKLi9uEHojKB+SuPIrB/HW4UYh/ZpOMyilVqUnX8RYbaK3qQ/gK Un++LjTmao9cXyjIDCiHJB630W01YSWXq1nwuSYkLuXCuALmmtgA4z8pm+4K+w+G SctETKVcD0qi3Yx5mFdU5A1dHXMB7nzYNaiRWYLvLO5qAR33nUPI4lV8rog2TUPe 2iW4n0j2YxnTlwR3QMPYJRndP7KeR5XvOoNuh3XpKRVbmbc7/A4= =6RM5 -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.10-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client updates from Anna Schumaker: "Stable Fixes: - Wait for stateid updates after CLOSE/OPEN_DOWNGRADE # v5.4+ - Fix nfs_path in case of a rename retry - Support EXCHID4_FLAG_SUPP_FENCE_OPS v4.2 EXCHANGE_ID flag New features and improvements: - Replace dprintk() calls with tracepoints - Make cache consistency bitmap dynamic - Added support for the NFS v4.2 READ_PLUS operation - Improvements to net namespace uniquifier Other bugfixes and cleanups: - Remove redundant clnt pointer - Don't update timeout values on connection resets - Remove redundant tracepoints - Various cleanups to comments - Fix oops when trying to use copy_file_range with v4.0 source server - Improvements to flexfiles mirrors - Add missing 'local_lock=posix' mount option" * tag 'nfs-for-5.10-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (55 commits) NFSv4.2: support EXCHGID4_FLAG_SUPP_FENCE_OPS 4.2 EXCHANGE_ID flag NFSv4: Fix up RCU annotations for struct nfs_netns_client NFS: Only reference user namespace from nfs4idmap struct instead of cred nfs: add missing "posix" local_lock constant table definition NFSv4: Use the net namespace uniquifier if it is set NFSv4: Clean up initialisation of uniquified client id strings NFS: Decode a full READ_PLUS reply SUNRPC: Add an xdr_align_data() function NFS: Add READ_PLUS hole segment decoding SUNRPC: Add the ability to expand holes in data pages SUNRPC: Split out _shift_data_right_tail() SUNRPC: Split out xdr_realign_pages() from xdr_align_pages() NFS: Add READ_PLUS data segment support NFS: Use xdr_page_pos() in NFSv4 decode_getacl() SUNRPC: Implement a xdr_page_pos() function SUNRPC: Split out a function for setting current page NFS: fix nfs_path in case of a rename retry fs: nfs: return per memcg count for xattr shrinkers NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE nfs: remove incorrect fallthrough label ...	2020-10-20 13:26:30 -07:00
Olga Kornievskaia	8c39076c27	NFSv4.2: support EXCHGID4_FLAG_SUPP_FENCE_OPS 4.2 EXCHANGE_ID flag RFC 7862 introduced a new flag that either client or server is allowed to set: EXCHGID4_FLAG_SUPP_FENCE_OPS. Client needs to update its bitmask to allow for this flag value. v2: changed minor version argument to unsigned int Signed-off-by: Olga Kornievskaia <kolga@netapp.com> CC: <stable@vger.kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-10-16 09:28:43 -04:00
Trond Myklebust	094eca3719	NFSv4: Fix up RCU annotations for struct nfs_netns_client The identifier is read as an RCU protected string. Its value may be changed during the lifetime of the network namespace by writing a new string into the sysfs pseudofile (at which point, we free the old string only after a call to synchronize_rcu()). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-10-15 13:31:08 -04:00
Sargun Dhillon	61ca2c4afd	NFS: Only reference user namespace from nfs4idmap struct instead of cred The nfs4idmapper only needs access to the user namespace, and not the entire cred struct. This replaces the struct cred* member with struct user_namespace*. This is mostly hygiene, so we don't have to hold onto the cred object, which has extraneous references to things like user_struct. This also makes switching away from init_user_ns more straightforward in the future. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-10-13 15:56:54 -04:00
Linus Torvalds	3ad11d7ac8	block-5.10-2020-10-12 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+EWUgQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpnoxEADCVSNBRkpV0OVkOEC3wf8EGhXhk01Jnjtl u5Mg2V55hcgJ0thQxBV/V28XyqmsEBrmAVi0Yf8Vr9Qbq4Ze08Wae4ChS4rEOyh1 jTcGYWx5aJB3ChLvV/HI0nWQ3bkj03mMrL3SW8rhhf5DTyKHsVeTenpx42Qu/FKf fRzi09FSr3Pjd0B+EX6gunwJnlyXQC5Fa4AA0GhnXJzAznANXxHkkcXu8a6Yw75x e28CfhIBliORsK8sRHLoUnPpeTe1vtxCBhBMsE+gJAj9ZUOWMzvNFIPP4FvfawDy 6cCQo2m1azJ/IdZZCDjFUWyjh+wxdKMp+NNryEcoV+VlqIoc3n98rFwrSL+GIq5Z WVwEwq+AcwoMCsD29Lu1ytL2PQ/RVqcJP5UheMrbL4vzefNfJFumQVZLIcX0k943 8dFL2QHL+H/hM9Dx5y5rjeiWkAlq75v4xPKVjh/DHb4nehddCqn/+DD5HDhNANHf c1kmmEuYhvLpIaC4DHjE6DwLh8TPKahJjwsGuBOTr7D93NUQD+OOWsIhX6mNISIl FFhP8cd0/ZZVV//9j+q+5B4BaJsT+ZtwmrelKFnPdwPSnh+3iu8zPRRWO+8P8fRC YvddxuJAmE6BLmsAYrdz6Xb/wqfyV44cEiyivF0oBQfnhbtnXwDnkDWSfJD1bvCm ZwfpDh2+Tg== =LzyE -----END PGP SIGNATURE----- Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: - Series of merge handling cleanups (Baolin, Christoph) - Series of blk-throttle fixes and cleanups (Baolin) - Series cleaning up BDI, seperating the block device from the backing_dev_info (Christoph) - Removal of bdget() as a generic API (Christoph) - Removal of blkdev_get() as a generic API (Christoph) - Cleanup of is-partition checks (Christoph) - Series reworking disk revalidation (Christoph) - Series cleaning up bio flags (Christoph) - bio crypt fixes (Eric) - IO stats inflight tweak (Gabriel) - blk-mq tags fixes (Hannes) - Buffer invalidation fixes (Jan) - Allow soft limits for zone append (Johannes) - Shared tag set improvements (John, Kashyap) - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel) - DM no-wait support (Mike, Konstantin) - Request allocation improvements (Ming) - Allow md/dm/bcache to use IO stat helpers (Song) - Series improving blk-iocost (Tejun) - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang, Xianting, Yang, Yufen, yangerkun) * tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits) block: fix uapi blkzoned.h comments blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue blk-mq: get rid of the dead flush handle code path block: get rid of unnecessary local variable block: fix comment and add lockdep assert blk-mq: use helper function to test hw stopped block: use helper function to test queue register block: remove redundant mq check block: invoke blk_mq_exit_sched no matter whether have .exit_sched percpu_ref: don't refer to ref->data if it isn't allocated block: ratelimit handle_bad_sector() message blk-throttle: Re-use the throtl_set_slice_end() blk-throttle: Open code __throtl_de/enqueue_tg() blk-throttle: Move service tree validation out of the throtl_rb_first() blk-throttle: Move the list operation after list validation blk-throttle: Fix IO hang for a corner case blk-throttle: Avoid tracking latency if low limit is invalid blk-throttle: Avoid getting the current time if tg->last_finish_time is 0 blk-throttle: Remove a meaningless parameter for throtl_downgrade_state() block: Remove redundant 'return' statement ...	2020-10-13 12:12:44 -07:00
Linus Torvalds	22230cd2c5	Merge branch 'compat.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull compat mount cleanups from Al Viro: "The last remnants of mount(2) compat buried by Christoph. Buried into NFS, that is. Generally I'm less enthusiastic about "let's use in_compat_syscall() deep in call chain" kind of approach than Christoph seems to be, but in this case it's warranted - that had been an NFS-specific wart, hopefully not to be repeated in any other filesystems (read: any new filesystem introducing non-text mount options will get NAKed even if it doesn't mess the layout up). IOW, not worth trying to grow an infrastructure that would avoid that use of in_compat_syscall()..." * 'compat.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: remove compat_sys_mount fs,nfs: lift compat nfs4 mount data handling into the nfs code nfs: simplify nfs4_parse_monolithic	2020-10-12 16:44:57 -07:00
Scott Mayhew	a2d24bcb97	nfs: add missing "posix" local_lock constant table definition "mount -o local_lock=posix..." was broken by the mount API conversion due to the missing constant. Fixes: `e38bb238ed` ("NFS: Convert mount option parsing to use functionality from fs_parser.h") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-10-12 13:23:14 -04:00
Trond Myklebust	39d43d1641	NFSv4: Use the net namespace uniquifier if it is set If a container sets a net namespace specific uniquifier, then use that in the setclientid/exchangeid process. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-10-09 10:05:06 -04:00
Trond Myklebust	1aee551334	NFSv4: Clean up initialisation of uniquified client id strings When the user sets a uniquifier, then ensure we copy the string so that calls to strlen() etc are atomic with calls to snprintf(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-10-09 10:04:36 -04:00
Anna Schumaker	bff049a3b5	NFS: Decode a full READ_PLUS reply Decode multiple hole and data segments sent by the server, placing everything directly where they need to go in the xdr pages. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-10-07 14:28:40 -04:00
Anna Schumaker	c05eafad6b	NFS: Add READ_PLUS hole segment decoding We keep things simple for now by only decoding a single hole or data segment returned by the server, even if they returned more to us. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-10-07 14:28:40 -04:00
Anna Schumaker	c567552612	NFS: Add READ_PLUS data segment support This patch adds client support for decoding a single NFS4_CONTENT_DATA segment returned by the server. This is the simplest implementation possible, since it does not account for any hole segments in the reply. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-10-07 14:28:39 -04:00
Anna Schumaker	a14a63594c	NFS: Use xdr_page_pos() in NFSv4 decode_getacl() Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-10-07 14:28:39 -04:00
Ashish Sangwan	247db73560	NFS: fix nfs_path in case of a rename retry We are generating incorrect path in case of rename retry because we are restarting from wrong dentry. We should restart from the dentry which was received in the call to nfs_path. CC: stable@vger.kernel.org Signed-off-by: Ashish Sangwan <ashishsangwan2@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-10-06 10:21:18 -04:00
Yang Shi	5904c16d22	fs: nfs: return per memcg count for xattr shrinkers The list_lru_count() returns the pre node count, but the new xattr shrinkers are memcg aware, so the shrinkers should return per memcg count by calling list_lru_shrink_count() instead. Otherwise over-shrink might be experienced. The problem was spotted by visual code inspection. Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-10-02 08:46:46 -04:00
Benjamin Coddington	b4868b44c5	NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE Since commit `0e0cb35b41` ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE") the following livelock may occur if a CLOSE races with the update of the nfs_state: Process 1 Process 2 Server ========= ========= ======== OPEN file OPEN file Reply OPEN (1) Reply OPEN (2) Update state (1) CLOSE file (1) Reply OLD_STATEID (1) CLOSE file (2) Reply CLOSE (-1) Update state (2) wait for state change OPEN file wake CLOSE file OPEN file wake CLOSE file ... ... We can avoid this situation by not issuing an immediate retry with a bumped seqid when CLOSE/OPEN_DOWNGRADE receives NFS4ERR_OLD_STATEID. Instead, take the same approach used by OPEN and wait at least 5 seconds for outstanding stateid updates to complete if we can detect that we're out of sequence. Note that after this change it is still possible (though unlikely) that CLOSE waits a full 5 seconds, bumps the seqid, and retries -- and that attempt races with another OPEN at the same time. In order to avoid this race (which would result in the livelock), update nfs_need_update_open_stateid() to handle the case where: - the state is NFS_OPEN_STATE, and - the stateid doesn't match the current open stateid Finally, nfs_need_update_open_stateid() is modified to be idempotent and renamed to better suit the purpose of signaling that the stateid passed is the next stateid in sequence. Fixes: `0e0cb35b41` ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-10-02 08:43:09 -04:00
Nick Desaulniers	fb08334bb3	nfs: remove incorrect fallthrough label There is no case after the default from which to fallthrough to. Clang will error in this case (unhelpfully without context, see link below) and GCC will with -Wswitch-unreachable. The previous commit should have just replaced the comment with a break statement. If we consider implicit fallthrough to be a design mistake of C, then all case statements should be terminated with one of the following statements: * break * continue * return * fallthrough * goto * (call of function with __attribute__(__noreturn__)) Fixes: 2a1390c95a69 ("nfs: Convert to use the preferred fallthrough macro") Link: https://bugs.llvm.org/show_bug.cgi?id=47539 Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-10-02 08:43:08 -04:00
Christoph Hellwig	55b2598e84	bdi: initialize ->ra_pages and ->io_pages in bdi_init Set up a readahead size by default, as very few users have a good reason to change it. This means code, ecryptfs, and orangefs now set up the values while they were previously missing it, while ubifs, mtd and vboxsf manually set it to 0 to avoid readahead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Richard Weinberger <richard@nod.at> [ubifs, mtd] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:39 -06:00
Olga Kornievskaia	76bd5c016e	NFSv4: make cache consistency bitmask dynamic Client uses static bitmask for GETATTR on CLOSE/WRITE/DELEGRETURN and ignores the fact that it might have some attributes marked invalid in its cache. Compared to v3 where all attributes are retrieved in postop attributes, v4's cache is frequently out of sync and leads to standalone GETATTRs being sent to the server. Instead, in addition to the minimum cache consistency attributes also check cache_validity and adjust the GETATTR request accordingly. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-24 10:42:49 -04:00
Wang Qing	9f26645127	nfs: fix spellint typo in pnfs.c Change the comment typo: "manger" -> "manager". Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-24 10:42:49 -04:00
Christoph Hellwig	67e306c690	fs,nfs: lift compat nfs4 mount data handling into the nfs code There is no reason the generic fs code should bother with NFS specific binary mount data - lift the conversion into nfs4_parse_monolithic instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-09-22 23:45:57 -04:00
Christoph Hellwig	a1c7dc5d15	nfs: simplify nfs4_parse_monolithic Remove a level of indentation for the version 1 mount data parsing, and simplify the NULL data case a little bit as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-09-22 23:45:56 -04:00
Trond Myklebust	c754e137f5	pNFS/flexfiles: Be consistent about mirror index types A mirror index is always of type u32. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-09-21 12:06:27 -04:00

1 2 3 4 5 ...

5953 Commits