OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	74f602dc96	NFS client updates for Linux 5.11 Highlights include: Features: - NFSv3: Add emulation of lookupp() to improve open_by_filehandle() support. - A series of patches to improve readdir performance, particularly with large directories. - Basic support for using NFS/RDMA with the pNFS files and flexfiles drivers. - Micro-optimisations for RDMA. - RDMA tracing improvements. Bugfixes: - Fix a long standing bug with xs_read_xdr_buf() when receiving partial pages (Dan Aloni). - Various fixes for getxattr and listxattr, when used over non-TCP transports. - Fixes for containerised NFS from Sargun Dhillon. - switch nfsiod to be an UNBOUND workqueue (Neil Brown). - READDIR should not ask for security label information if there is no LSM policy. (Olga Kornievskaia) - Avoid using interval-based rebinding with TCP in lockd (Calum Mackay). - A series of RPC and NFS layer fixes to support the NFSv4.2 READ_PLUS code. - A couple of fixes for pnfs/flexfiles read failover Cleanups: - Various cleanups for the SUNRPC xdr code in conjunction with the READ_PLUS fixes. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl/aiaIACgkQZwvnipYK APIOihAAvONscxrFSaGRh2ICNv9I/zXW/A5+R3qnkESPVLTqTPJVphoN7FlINAr1 B74pg6n4T4viycbvsogU2+kHrlJZO7B8lTkJL7ynm9Wgyw8+2Ga4QEn1bsAoqmuY b91p/+LfOLKrYeeojoH31PC73uOYYG1WHXJhjq0l9b5CTgThWpj6O3gDaFEbFvmz A7V3yqSp04sV70YxUhwelBHZ5BXdiXIKsPnIwvXXHuY7IcamrE4EA3wGCwtxkBnu 4dwbOtRXURNSev0r3n6FsH4wZl+/nvp9UpnGdPtVv94F1zm2JKLwkhoJejS/vpjq eyKc7ZXBQ0uHbTWI2Yj1YjA61VIUO0R0EDuyTAnRKDeaarID42n5kMG7J8cIglZR jQfyx99xm0eSrdwxC09tcRL/lBzYcOfc6pJo5P9BtaFtRvbp9iFIHuFKlrXbULd4 WrZzDMhiKVYGSTcTpfQyVoK2rCvn6W1Ida4iYeI0gkJ1v9X90UhbtJOyggn/bxyL DV/Qy40+l48n7CZfPU2eDv4WXqjKGRibpDoWMBLwUH20dDEX6kKYv3BfApFYGqyO /GTPAFUZarCy8BENvzZv/Jb9mt5pDQM5p9ZXpdUOhydLMMA+pauaT/Gr+pAHPIPx MPj546Gh2cEaT883xvRrJmQTG0nw/WscPNcHaJcgL5oYltmuwck= =IKWG -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - NFSv3: Add emulation of lookupp() to improve open_by_filehandle() support - A series of patches to improve readdir performance, particularly with large directories - Basic support for using NFS/RDMA with the pNFS files and flexfiles drivers - Micro-optimisations for RDMA - RDMA tracing improvements Bugfixes: - Fix a long standing bug with xs_read_xdr_buf() when receiving partial pages (Dan Aloni) - Various fixes for getxattr and listxattr, when used over non-TCP transports - Fixes for containerised NFS from Sargun Dhillon - switch nfsiod to be an UNBOUND workqueue (Neil Brown) - READDIR should not ask for security label information if there is no LSM policy (Olga Kornievskaia) - Avoid using interval-based rebinding with TCP in lockd (Calum Mackay) - A series of RPC and NFS layer fixes to support the NFSv4.2 READ_PLUS code - A couple of fixes for pnfs/flexfiles read failover Cleanups: - Various cleanups for the SUNRPC xdr code in conjunction with the READ_PLUS fixes" * tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (90 commits) NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read() pNFS/flexfiles: Avoid spurious layout returns in ff_layout_choose_ds_for_read NFSv4/pnfs: Add tracing for the deviceid cache fs/lockd: convert comma to semicolon NFSv4.2: fix error return on memory allocation failure NFSv4.2/pnfs: Don't use READ_PLUS with pNFS yet NFSv4.2: Deal with potential READ_PLUS data extent buffer overflow NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow NFSv4.2: Handle hole lengths that exceed the READ_PLUS read buffer NFSv4.2: decode_read_plus_hole() needs to check the extent offset NFSv4.2: decode_read_plus_data() must skip padding after data segment NFSv4.2: Ensure we always reset the result->count in decode_read_plus() SUNRPC: When expanding the buffer, we may need grow the sparse pages SUNRPC: Cleanup - constify a number of xdr_buf helpers SUNRPC: Clean up open coded setting of the xdr_stream 'nwords' field SUNRPC: _copy_to/from_pages() now check for zero length SUNRPC: Cleanup xdr_shrink_bufhead() SUNRPC: Fix xdr_expand_hole() SUNRPC: Fixes for xdr_align_data() SUNRPC: _shift_data_left/right_pages should check the shift length ...	2020-12-17 12:15:03 -08:00
Linus Torvalds	1a50ede2b3	Highlights: - Improve support for re-exporting NFS mounts - Replace NFSv4 XDR decoding C macros with xdr_stream helpers - Support for multiple RPC/RDMA chunks per RPC transaction -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAl/Q4dIACgkQM2qzM29m f5fInw//eDrmXBEhxbzcgeqNilGU5Qkn4INJtAcOGwPcw5Kjp4UVNGFpZNPqIDSf FP0Yw0d/rW7UggwCviPcs/adLTasU9skq1jgAv8d0ig4DtPbeqFo6BvbY+G2JxVF EfTeHzr6w6er8HRqyuLN4hjm1rQIpQlDHaYU4QcMs4fjPVv88eYLiwnYGYf3X46i vBYstu1IRxHhg2x4O833xmiL6VbkZDQoWwDjGICylxUBcNUtAmq/sETjTa4JVEJj 4vgXdcJmAFjNgAOrmoR3DISsr9mvCvKN9g3C0+hHiRERTGEon//HzvscWH74wT48 o0LUW0ZWgpmunTcmiSNeeiHNsUXJyy3A/xyEdteqqnvSxulxlqkQzb15Eb+92+6n BHGT/sOz1zz+/l9NCpdeEl5AkSA9plV8Iqd/kzwFwe1KwHMjldeMw/mhMut8EM2j b54EMsp40ipITAwBHvcygCXiWAn/mPex6bCr17Dijo6MsNLsyd+cDsazntbNzwz3 RMGMf2TPOi8tWswrTUS9J5xKk5LAEWX/6Z/hTA1YlsB3PfrhXO97ztrytxvoO/bp M0NREA+NNMn/JyyL8FT3ID5peaLVHhA1GHw9CcUw3C7OVzmsEg29D4zNo02dF1TC LIyekp0kbSGGY1jLOeMLsa6Jr+2+40CcctsooVkRA+3rN0tJQvw= =1uP3 -----END PGP SIGNATURE----- Merge tag 'nfsd-5.11' of git://git.linux-nfs.org/projects/cel/cel-2.6 Pull nfsd updates from Chuck Lever: "Several substantial changes this time around: - Previously, exporting an NFS mount via NFSD was considered to be an unsupported feature. With v5.11, the community has attempted to make re-exporting a first-class feature of NFSD. This would enable the Linux in-kernel NFS server to be used as an intermediate cache for a remotely-located primary NFS server, for example, even with other NFS server implementations, like a NetApp filer, as the primary. - A short series of patches brings support for multiple RPC/RDMA data chunks per RPC transaction to the Linux NFS server's RPC/RDMA transport implementation. This is a part of the RPC/RDMA spec that the other premiere NFS/RDMA implementation (Solaris) has had for a very long time, and completes the implementation of RPC/RDMA version 1 in the Linux kernel's NFS server. - Long ago, NFSv4 support was introduced to NFSD using a series of C macros that hid dprintk's and goto's. Over time, the kernel's XDR implementation has been greatly improved, but these C macros have remained and become fallow. A series of patches in this pull request completely replaces those macros with the use of current kernel XDR infrastructure. Benefits include: - More robust input sanitization in NFSD's NFSv4 XDR decoders. - Make it easier to use common kernel library functions that use XDR stream APIs (for example, GSS-API). - Align the structure of the source code with the RFCs so it is easier to learn, verify, and maintain our XDR implementation. - Removal of more than a hundred hidden dprintk() call sites. - Removal of some explicit manipulation of pages to help make the eventual transition to xdr->bvec smoother. - On top of several related fixes in 5.10-rc, there are a few more fixes to get the Linux NFSD implementation of NFSv4.2 inter-server copy up to speed. And as usual, there is a pinch of seasoning in the form of a collection of unrelated minor bug fixes and clean-ups. Many thanks to all who contributed this time around!" * tag 'nfsd-5.11' of git://git.linux-nfs.org/projects/cel/cel-2.6: (131 commits) nfsd: Record NFSv4 pre/post-op attributes as non-atomic nfsd: Set PF_LOCAL_THROTTLE on local filesystems only nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE exportfs: Add a function to return the raw output from fh_to_dentry() nfsd: close cached files prior to a REMOVE or RENAME that would replace target nfsd: allow filesystems to opt out of subtree checking nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations Revert "nfsd4: support change_attr_type attribute" nfsd4: don't query change attribute in v2/v3 case nfsd: minor nfsd4_change_attribute cleanup nfsd: simplify nfsd4_change_info nfsd: only call inode_query_iversion in the I_VERSION case nfs_common: need lock during iterate through the list NFSD: Fix 5 seconds delay when doing inter server copy NFSD: Fix sparse warning in nfs4proc.c SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcall sunrpc: clean-up cache downcall nfsd: Fix message level for normal termination NFSD: Remove macros that are no longer used NFSD: Replace READ* macros in nfsd4_decode_compound() ...	2020-12-15 18:52:30 -08:00
Trond Myklebust	edffb84cc8	NFSoRDmA Client updates for Linux 5.11 Cleanups and improvements: - Remove use of raw kernel memory addresses in tracepoints - Replace dprintk() call sites in ERR_CHUNK path - Trace unmap sync calls - Optimize MR DMA-unmapping -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl/SlTcACgkQ18tUv7Cl QOsc+xAA0qmDLbdZShFAiip/jgFvHkoIJzVcil5++xiRY77xOjR2rgkgBSv4hFYm XWBidP/eQm3n5r0S+LG0zudcHnaKBonZ0UV2j32PelMEnvn3H9qlSJYneEm9xv2O K1koNcbunJH8JRsLxbStdMnOnNwCmB+HzIjnWr87OgXkYucqDBBxt3RMyZVD46QU GypnItAtLns4Oacsw0TyPAPAxstjZ71YRlTMhtfyIYEJfvxUVRYRKvs7nPei9eCa 0uCzoKb28F7CWy+S9so7wSjfdDu4G+FmAfMvJbQF4irhgZBvzyzj4jq4hCnv377S XF6Eygm0Z4BSWNoAnRjgLx3Bo4ps4qZi7iAj8cUAksEKQ3QI2BWTWR3e07YwLfnm iwhWougP76zbtC3lNA+4D2yaYwA1LtN4IYp19KmS/0LyRe5EBry39ccE/eE508JA BGnCohawI9ya7WT3xTne9lyxNGhjm/jNDKt0ax0ze4hwIqSGekqhUzxQ0bl4suQz ZHP2gUEad07jZyy8gCcpEvvxA3fFW241vaG/hN34Dy47eHpAQDQIMk7611YRQXH/ jCBHoym1dHXc8eNa+gTmrjXdhVdgHo4zI0ppeaQvA2Ss0nPp8N8H2nJV/as536Dp on0b1zAF4o5uzGlEzHu+FuFGqUPE9My9/dNDqw3o0Cncu7adffk= =R45y -----END PGP SIGNATURE----- Merge tag 'nfs-rdma-for-5.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs into linux-next NFSoRDmA Client updates for Linux 5.11 Cleanups and improvements: - Remove use of raw kernel memory addresses in tracepoints - Replace dprintk() call sites in ERR_CHUNK path - Trace unmap sync calls - Optimize MR DMA-unmapping Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-15 20:08:41 -05:00
Trond Myklebust	5802f7c2a6	SUNRPC: When expanding the buffer, we may need grow the sparse pages If we're shifting the page data to the right, and this happens to be a sparse page array, then we may need to allocate new pages in order to receive the data. Reported-by: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:07 -05:00
Trond Myklebust	f8d0e60f10	SUNRPC: Cleanup - constify a number of xdr_buf helpers There are a number of xdr helpers for struct xdr_buf that do not change the structure itself. Mark those as taking const pointers for documentation purposes. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:07 -05:00
Trond Myklebust	5a5f1c2c2c	SUNRPC: Clean up open coded setting of the xdr_stream 'nwords' field Move the setting of the xdr_stream 'nwords' field into the helpers that reset the xdr_stream cursor. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:07 -05:00
Trond Myklebust	e43ac22b83	SUNRPC: _copy_to/from_pages() now check for zero length Clean up callers of _copy_to/from_pages() that still check for a zero length. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:07 -05:00
Trond Myklebust	6707fbd7d3	SUNRPC: Cleanup xdr_shrink_bufhead() Clean up xdr_shrink_bufhead() to use the new helpers instead of doing its own thing. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:07 -05:00
Trond Myklebust	c4f2f591f0	SUNRPC: Fix xdr_expand_hole() We do want to try to grow the buffer if possible, but if that attempt fails, we still want to move the data and truncate the XDR message. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:07 -05:00
Trond Myklebust	9a20f6f4e6	SUNRPC: Fixes for xdr_align_data() The main use case right now for xdr_align_data() is to shift the page data to the left, and in practice shrink the total XDR data buffer. This patch ensures that we fix up the accounting for the buffer length as we shift that data around. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:07 -05:00
Trond Myklebust	c54e959b36	SUNRPC: _shift_data_left/right_pages should check the shift length Exit early if the shift is zero. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:07 -05:00
Chuck Lever	15261b9126	xprtrdma: Fix XDRBUF_SPARSE_PAGES support Olga K. observed that rpcrdma_marsh_req() allocates sparse pages only when it has determined that a Reply chunk is necessary. There are plenty of cases where no Reply chunk is needed, but the XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in rpcrdma_inline_fixup() when it tries to copy parts of the received Reply into a missing page. To avoid crashing, handle sparse page allocation up front. Until XATTR support was added, this issue did not appear often because the only SPARSE_PAGES consumer always expected a reply large enough to always require a Reply chunk. Reported-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:51:07 -05:00
Dan Aloni	ac9645c873	sunrpc: fix xs_read_xdr_buf for partial pages receive When receiving pages data, return value 'ret' when positive includes `buf->page_base`, so we should subtract that before it is used for changing `offset` and comparing against `want`. This was discovered on the very rare cases where the server returned a chunk of bytes that when added to the already received amount of bytes for the pages happened to match the current `recv.len`, for example on this case: buf->page_base : 258356 actually received from socket: 1740 ret : 260096 want : 260096 In this case neither of the two 'if ... goto out' trigger, and we continue to tail parsing. Worth to mention that the ensuing EMSGSIZE from the continued execution of `xs_read_xdr_buf` may be observed by an application due to 4 superfluous bytes being added to the pages data. Fixes: `277e4ab7d5` ("SUNRPC: Simplify TCP receive code by switching to using iterators") Signed-off-by: Dan Aloni <dan@kernelim.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-14 06:46:55 -05:00
Chuck Lever	5e54dafbe0	SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcall There's no need to defer allocation of pages for the receive buffer. - This upcall is quite infrequent - gssp_alloc_receive_pages() can allocate the pages with GFP_KERNEL, unlike the transport - gssp_alloc_receive_pages() knows exactly how many pages are needed Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Olga Kornievskaia <kolga@netapp.com>	2020-12-09 09:38:34 -05:00
Roberto Bergantinos Corpas	4b5cff7ed8	sunrpc: clean-up cache downcall We can simplify code around cache_downcall unifying memory allocations using kvmalloc. This has the benefit of getting rid of cache_slow_downcall (and queue_io_mutex), and also matches userland allocation size and limits. Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:38:34 -05:00
Fedor Tokarev	35a6d39672	net: sunrpc: Fix 'snprintf' return value check in 'do_xprt_debugfs' 'snprintf' returns the number of characters which would have been written if enough space had been available, excluding the terminating null byte. Thus, the return value of 'sizeof(buf)' means that the last character has been dropped. Signed-off-by: Fedor Tokarev <ftokarev@gmail.com> Fixes: `2f34b8bfae` ("SUNRPC: add links for all client xprts to debugfs") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:54 -05:00
Trond Myklebust	eee1f54964	SUNRPC: Fix open coded xdr_stream_remaining() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:54 -05:00
Trond Myklebust	0279024f22	SUNRPC: Fix up xdr_set_page() While we always want to align to the next page and/or the beginning of the tail buffer when we call xdr_set_next_page(), the functions xdr_align_data() and xdr_expand_hole() really want to align to the next object in that next page or tail. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:54 -05:00
Trond Myklebust	9ed5af268e	SUNRPC: Clean up the handling of page padding in rpc_prepare_reply_pages() rpc_prepare_reply_pages() currently expects the 'hdrsize' argument to contain the length of the data that we expect to want placed in the head kvec plus a count of 1 word of padding that is placed after the page data. This is very confusing when trying to read the code, and sometimes leads to callers adding an arbitrary value of '1' just in order to satisfy the requirement (whether or not the page data actually needs such padding). This patch aims to clarify the code by changing the 'hdrsize' argument to remove that 1 word of padding. This means we need to subtract the padding from all the existing callers. Fixes: `02ef04e432` ("NFS: Account for XDR pad of buf->pages") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	1d97316692	SUNRPC: Fix up xdr_read_pages() to take arbitrary object lengths Fix up xdr_read_pages() so that it can handle object lengths that are larger than the page length, by simply aligning to the next object in the buffer tail. The function will continue to return the length of the truncate object data that actually fit into the pages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	8d86e373b0	SUNRPC: Clean up helpers xdr_set_iov() and xdr_set_page_base() Allow xdr_set_iov() to set a base so that we can use it to set the cursor to a specific position in the kvec buffer. If the new base overflows the kvec/pages buffer in either xdr_set_iov() or xdr_set_page_base(), then truncate it so that we point to the end of the buffer. Finally, change both function to return the number of bytes remaining to read in their buffers. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	2b1f83d108	SUNRPC: Fix up typo in xdr_init_decode() We already know that the head buffer and page are empty, so if there is any data, it is in the tail. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	4aceaaea5e	SUNRPC: Fix up open coded kmemdup_nul() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	c87b056e58	SUNRPC: Remove unused function xprt_load_transport() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	1fc5f13186	SUNRPC: Add a helper to return the transport identifier given a netid Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	9bccd26461	SUNRPC: Close a race with transport setup and module put After we've looked up the transport module, we need to ensure it can't go away until we've finished running the transport setup code. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	d5aa6b22e2	SUNRPC: xprt_load_transport() needs to support the netid "rdma6" According to RFC5666, the correct netid for an IPv6 addressed RDMA transport is "rdma6", which we've supported as a mount option since Linux-4.7. The problem is when we try to load the module "xprtrdma6", that will fail, since there is no modulealias of that name. Fixes: `181342c5eb` ("xprtrdma: Add rdma6 option to support NFS/RDMA IPv6") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	e4c72201b6	SUNRPC: rpc_wake_up() should wake up tasks in the correct order Currently, we wake up the tasks by priority queue ordering, which means that we ignore the batching that is supposed to help with QoS issues. Fixes: `c049f8ea9a` ("SUNRPC: Remove the bh-safe lock requirement on the rpc_wait_queue->lock") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:51 -05:00
Chuck Lever	0359af7ac3	SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcall There's no need to defer allocation of pages for the receive buffer. - This upcall is quite infrequent - gssp_alloc_receive_pages() can allocate the pages with GFP_KERNEL, unlike the transport - gssp_alloc_receive_pages() knows exactly how many pages are needed Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-12-02 10:06:52 -05:00
Chuck Lever	c1346a1216	NFSD: Replace the internals of the READ_BUF() macro Convert the READ_BUF macro in nfs4xdr.c from open code to instead use the new xdr_stream-style decoders already in use by the encode side (and by the in-kernel NFS client implementation). Once this conversion is done, each individual NFSv4 argument decoder can be independently cleaned up to replace these macros with C code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 14:46:36 -05:00
Chuck Lever	5191955d6f	SUNRPC: Prepare for xdr_stream-style decoding on the server-side A "permanent" struct xdr_stream is allocated in struct svc_rqst so that it is usable by all server-side decoders. A per-rqst scratch buffer is also allocated to handle decoding XDR data items that cross page boundaries. To demonstrate how it will be used, add the first call site for the new svcxdr_init_decode() API. As an additional part of the overall conversion, add symbolic constants for successful and failed XDR operations. Returning "0" is overloaded. Sometimes it means something failed, but sometimes it means success. To make it more clear when XDR decoding functions succeed or fail, introduce symbolic constants. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 14:46:35 -05:00
Chuck Lever	0ae4c3e8a6	SUNRPC: Add xdr_set_scratch_page() and xdr_reset_scratch_buffer() Clean up: De-duplicate some frequently-used code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 14:46:35 -05:00
Chuck Lever	156708adf2	SUNRPC: Move the svc_xdr_recvfrom() tracepoint Commit `c509f15a58` ("SUNRPC: Split the xdr_buf event class") added display of the rqst's XID to the svc_xdr_buf_class. However, when the recvfrom tracepoint fires, rq_xid has yet to be filled in with the current XID. So it ends up recording the previous XID that was handled by that svc_rqst. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:24 -05:00
Chuck Lever	d7cc739726	svcrdma: support multiple Read chunks per RPC An efficient way to handle multiple Read chunks is to post them all together and then take a single completion. This is also how the code is already structured: when the Read completion fires, all portions of the incoming RPC message are available to be assembled. The difficult problem is setting up the Read sink buffers so that the server pulls the client's data into place, making subsequent pull-up unnecessary. There are several cases: * No Read chunks. No-op. * One data item Read chunk. This is the fast case, where the inline part of the RPC-over-RDMA message becomes the head and tail, and the data item chunk is placed in buf->pages. * A Position-zero Read chunk. Treated like TCP: the Read chunk is pulled into contiguous pages. + A Position-zero Read chunk with data item chunks. Treated like TCP: all of the Read chunks are pulled into contiguous pages. + Multiple data item chunks. Treated like TCP: the inline part is copied and the data item chunks are pulled into contiguous pages. The "*" cases are already supported. This patch adds support for the "+" cases. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	d96962e6d0	svcrdma: Use the new parsed chunk list when pulling Read chunks As a pre-requisite for handling multiple Read chunks in each Read list, convert svc_rdma_recv_read_chunk() to use the new parsed Read chunk list. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	bafe9c27d5	svcrdma: Rename info::ri_chunklen I'm about to change the purpose of ri_chunklen: Instead of tracking the number of bytes in one Read chunk, it will track the total number of bytes in the Read list. Rename it for clarity. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	b704be09dc	svcrdma: Clean up chunk tracepoints We already have trace_svcrdma_decode_rseg(), which records each ingress Read segment. Instead of reporting those again when they are about to be posted as RDMA Reads, let's fire one tracepoint before posting each type of chunk. So we'll get: nfsd-1998 [002] 321.666615: svcrdma_decode_rseg: cq.id=4 cid=42 segno=0 position=0 192@0x013ca9ebfae14000:0xb0010b05 nfsd-1998 [002] 321.666615: svcrdma_decode_rseg: cq.id=4 cid=42 segno=1 position=0 7688@0x013ca9ebf914e000:0xb0010a05 nfsd-1998 [002] 321.666615: svcrdma_decode_rseg: cq.id=4 cid=42 segno=2 position=0 28@0x013ca9ebfae15000:0xb0010905 nfsd-1998 [002] 321.666622: svcrdma_decode_rqst: cq.id=4 cid=42 xid=0x013ca9eb vers=1 credits=128 proc=RDMA_NOMSG hdrlen=100 nfsd-1998 [002] 321.666642: svcrdma_post_read_chunk: cq.id=3 cid=112 sqecount=3 kworker/2:1H-221 [002] 321.673949: svcrdma_wc_read: cq.id=3 cid=112 status=SUCCESS (0/0x0) Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	7954c8503b	svcrdma: Remove chunk list pointers Clean up: These pointers are no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	41bc163ffe	svcrdma: Support multiple Write chunks in svc_rdma_send_reply_chunk Refactor svc_rdma_send_reply_chunk() so that it Sends only the parts of rq_res that do not contain a result payload. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	2371bcc056	svcrdma: Support multiple Write chunks in svc_rdma_map_reply_msg() Refactor: svc_rdma_map_reply_msg() is restructured to DMA map only the parts of rq_res that do not contain a result payload. This change has been tested to confirm that it does not cause a regression in the no Write chunk and single Write chunk cases. Multiple Write chunks have not been tested. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	9d0b09d5ef	svcrdma: Support multiple write chunks when pulling up When counting the number of SGEs needed to construct a Send request, do not count result payloads. And, when copying the Reply message into the pull-up buffer, result payloads are not to be copied to the Send buffer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	6911f3e10c	svcrdma: Use parsed chunk lists to encode Reply transport headers Refactor: Instead of re-parsing the ingress RPC Call transport header when constructing the egress RPC Reply transport header, use the new parsed Write list and Reply chunk, which are version- agnostic and already XDR decoded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	7a1cbfa180	svcrdma: Use parsed chunk lists to construct RDMA Writes Refactor: Instead of re-parsing the ingress RPC Call transport header when constructing RDMA Writes, use the new parsed chunk lists for the Write list and Reply chunk, which are version-agnostic and already XDR-decoded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	58b2e0fefa	svcrdma: Use parsed chunk lists to detect reverse direction replies Refactor: Don't duplicate header decoding smarts here. Instead, use the new parsed chunk lists. Note that the XID sanity test is also removed. The XID is already looked up by the cb handler, and is rejected if it's not recognized. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	eb3de6a49d	svcrdma: Use parsed chunk lists to derive the inv_rkey Refactor: Don't duplicate header decoding smarts here. Instead, use the new parsed chunk lists. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	78147ca8b4	svcrdma: Add a "parsed chunk list" data structure This simple data structure binds the location of each data payload inside of an RPC message to the chunk that will be used to push it to or pull it from the client. There are several benefits to this small additional overhead: * It enables support for more than one chunk in incoming Read and Write lists. * It translates the version-specific on-the-wire format into a generic in-memory structure, enabling support for multiple versions of the RPC/RDMA transport protocol. * It enables the server to re-organize a chunk list if it needs to adjust where Read chunk data lands in server memory without altering the contents of the XDR-encoded Receive buffer. Construction of these lists is done while sanity checking each incoming RPC/RDMA header. Subsequent patches will make use of the generated data structures. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	ded380f100	svcrdma: Clean up svc_rdma_encode_reply_chunk() Refactor: Match the control flow of svc_rdma_encode_write_list(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	f6ad77590a	svcrdma: Post RDMA Writes while XDR encoding replies The only RPC/RDMA ordering requirement between RDMA Writes and RDMA Sends is that the responder must post the Writes on the Send queue before posting the Send that conveys the RPC Reply for that Write payload. The Linux NFS server implementation now has a transport method that can post result Payload Writes earlier than svc_rdma_sendto: ->xpo_result_payload() This gets RDMA Writes going earlier so they are more likely to be complete at the remote end before the Send completes. Some care must be taken with pulled-up Replies. We don't want to push the Write chunk and then send the same payload data via Send. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	76e5492b16	NFSD: Invoke svc_encode_result_payload() in "read" NFSD encoders Have the NFSD encoders annotate the boundaries of every direct-data-placement eligible result data payload. Then change svcrdma to use that annotation instead of the xdr->page_len when handling Write chunks. For NFSv4 on RDMA, that enables the ability to recognize multiple result payloads per compound. This is a pre-requisite for supporting multiple Write chunks per RPC transaction. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	03493bca08	SUNRPC: Rename svc_encode_read_payload() Clean up: "result payload" is a less confusing name for these payloads. "READ payload" reflects only the NFS usage. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:21 -05:00

1 2 3 4 5 ...

4064 Commits