pnfs layout type from Christoph Hellwig. The new layout type is a
variant of the block layout which uses SCSI features to offer improved
fencing and device identification.
(Also: note this pull request also includes the client side of SCSI
layout, with Trond's permission.)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJW8+uhAAoJECebzXlCjuG+26YP/35DP4MPfszEJ5G0dYq5HMwl
dJUni8ajSHRswZ/2FqiBsRwmg3Djfc+uoXdOneD1f6ogkDe7S16yp+FRyh8/VwUs
Ym6LcxSjT28uqkxO0MblcnUl0G9nNSuOwqIsZ0HG7/UC7E6RmCF4o3r5fFUfOsA+
B3koB5UcHNAFythAk+GDwOQ46Fr96VkZ7Y+OhdNAwmeXZIdKXIufweueI/o2uipB
RoJFJ7lqrzAjFe+CqAUBr2l2k6lEKzdxbEH6HXQ5+cvVNwfVIgnrONpF78uF/p9T
NNDnZ+fn3YdRhd+W9RxUHZq7ZL5YOEA8kHsAlloeBH74GqCy7IcS+DrKt1ReM3px
bhgsXM3dqqJ9xiDGqmeE4VQwRF30SxgYZbO386E+cLHnCYV+vfY6RUaWPrk6On/r
FL9g3iyVvhyC4HO06Xm+uvvERw8R+fTZY9KZQKH2RL0Tr5DkWRRNJfasMO+PwGOv
Fdku01vyoA4Y6mbqUgQ9DmrbLO4gK3UyMiOTanQV9shrIDxI0MOuLK03zL25vZCM
s1A4YBpXmg4gx3XsOFM+tygv6EVujDu6scICeb+hj0vi0oG82Lx7T9e3MJEiYC+T
jbi8bu+x+0bX2obMprvDNVUzi/PgSUVpGCnRlbRTaXBa0lB6nV7uUiQ1HC9gGesm
ZWWiOv7du+7WlFP5c6r5
=mY8w
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Various bugfixes, a RDMA update from Chuck Lever, and support for a
new pnfs layout type from Christoph Hellwig. The new layout type is a
variant of the block layout which uses SCSI features to offer improved
fencing and device identification.
(Also: note this pull request also includes the client side of SCSI
layout, with Trond's permission.)"
* tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux:
sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
nfsd: recover: fix memory leak
nfsd: fix deadlock secinfo+readdir compound
nfsd4: resfh unused in nfsd4_secinfo
svcrdma: Use new CQ API for RPC-over-RDMA server send CQs
svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs
svcrdma: Remove close_out exit path
svcrdma: Hook up the logic to return ERR_CHUNK
svcrdma: Use correct XID in error replies
svcrdma: Make RDMA_ERROR messages work
rpcrdma: Add RPCRDMA_HDRLEN_ERR
svcrdma: svc_rdma_post_recv() should close connection on error
svcrdma: Close connection when a send error occurs
nfsd: Lower NFSv4.1 callback message size limit
svcrdma: Do not send Write chunk XDR pad with inline content
svcrdma: Do not write xdr_buf::tail in a Write chunk
svcrdma: Find client-provided write and reply chunks once per reply
nfsd: Update NFS server comments related to RDMA support
nfsd: Fix a memory leak when meeting unsupported state_protect_how4
nfsd4: fix bad bounds checking
Highlights include:
Features:
- Add support for multiple NFSv4.1 callbacks in flight
- Initial patchset for RPC multipath support
- Adapt RPC/RDMA to use the new completion queue API
Bugfixes and cleanups:
- nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed
- Cleanups to remove nfs_inode_dio_wait and nfs4_file_fsync
- Fix RPC/RDMA credit accounting
- Properly handle RDMA_ERROR replies
- xprtrdma: Do not wait if ib_post_send() fails
- xprtrdma: Segment head and tail XDR buffers on page boundaries
- xprtrdma cleanups for dprintk, physical_op_map and unused macros
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJW8Y7MAAoJEGcL54qWCgDyMsMP+we8JSgfVqI5X1lKpU9aPWkI
D912ybtV58Kv0elKwYvQMqm+mRvdMNz1hZNJa4sAEaPVBOGfFjyZLy3xlDlr0HTf
M+Juh0FNLTcUh1obxJamjsbpNxfg4b6f/Z29KWRzahv/MlpMJVS3hLjpAEzCcTYr
WfOOovV6mragtsBINegGl/6jk/x2D22JDnKcTU+8ltVZGJtZe+HoqTFhUOrLO5qm
wR3YO22fbOuiZxCPoST06kMNiksYnYXnOju8RwlKwFYq3bWke0jWstQtIC4vKs6K
4u5o74aTBL5zMkJPnJuIfN2if4LJPptSr1n7pItbv3MLmgY1mWjE6N2BATpijfhQ
p+Gt/GHTAvswuWrmwySZKLj/Q8EkBuw4ohPFwLQ9eFHl2USoV3G9KQw7H0odR4d1
IyQPCag+suN2lWBreFkPIV48dZyeCVk6JmJmy3SN+d0L1t3jd6gwSO2UBgG7S2Gd
qVbdxYRiU/zYP6E5wFouLhIc1beSfe4vnJqvnuWZrIId+haTE2+OLi7772WGIkSe
xoZVTg7AX4Wu79ZyWoH+e9FnDvEsRkRVv7HQfpsMq2gynBWj70/KemEoeZnjqWaB
UOWcH8/vNLrnwlXTh0VHG6I8t3s0EXgqQB4//tYRLI42oIj35W2pIMnjYt52DeVB
Mo5mbYghtR9bgeoRQ6V4
=kC3t
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Features:
- Add support for multiple NFSv4.1 callbacks in flight
- Initial patchset for RPC multipath support
- Adapt RPC/RDMA to use the new completion queue API
Bugfixes and cleanups:
- nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed
- Cleanups to remove nfs_inode_dio_wait and nfs4_file_fsync
- Fix RPC/RDMA credit accounting
- Properly handle RDMA_ERROR replies
- xprtrdma: Do not wait if ib_post_send() fails
- xprtrdma: Segment head and tail XDR buffers on page boundaries
- xprtrdma cleanups for dprintk, physical_op_map and unused macros"
* tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (35 commits)
nfs/blocklayout: make sure making a aligned read request
nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed
nfs: remove nfs_inode_dio_wait
nfs: remove nfs4_file_fsync
xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs
xprtrdma: Use an anonymous union in struct rpcrdma_mw
xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs
xprtrdma: Serialize credit accounting again
xprtrdma: Properly handle RDMA_ERROR replies
rpcrdma: Add RPCRDMA_HDRLEN_ERR
xprtrdma: Do not wait if ib_post_send() fails
xprtrdma: Segment head and tail XDR buffers on page boundaries
xprtrdma: Clean up dprintk format string containing a newline
xprtrdma: Clean up physical_op_map()
xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro
NFS add callback_ops to nfs4_proc_bind_conn_to_session_callback
pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3
SUNRPC: Allow addition of new transports to a struct rpc_clnt
NFSv4.1: nfs4_proc_bind_conn_to_session must iterate over all connections
SUNRPC: Make NFS swap work with multipath
...
sunrpc_cache_pipe_upcall() can detect a race if CACHE_PENDING is no longer
set. In this case it aborts the queuing of the upcall.
However it has already taken a new counted reference on "h" and
doesn't "put" it, even though it frees the data structure holding the reference.
So let's delay the "cache_get" until we know we need it.
Fixes: f9e1aedc6c ("sunrpc/cache: remove races with queuing an upcall.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull crypto update from Herbert Xu:
"Here is the crypto update for 4.6:
API:
- Convert remaining crypto_hash users to shash or ahash, also convert
blkcipher/ablkcipher users to skcipher.
- Remove crypto_hash interface.
- Remove crypto_pcomp interface.
- Add crypto engine for async cipher drivers.
- Add akcipher documentation.
- Add skcipher documentation.
Algorithms:
- Rename crypto/crc32 to avoid name clash with lib/crc32.
- Fix bug in keywrap where we zero the wrong pointer.
Drivers:
- Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver.
- Add PIC32 hwrng driver.
- Support BCM6368 in bcm63xx hwrng driver.
- Pack structs for 32-bit compat users in qat.
- Use crypto engine in omap-aes.
- Add support for sama5d2x SoCs in atmel-sha.
- Make atmel-sha available again.
- Make sahara hashing available again.
- Make ccp hashing available again.
- Make sha1-mb available again.
- Add support for multiple devices in ccp.
- Improve DMA performance in caam.
- Add hashing support to rockchip"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
crypto: qat - remove redundant arbiter configuration
crypto: ux500 - fix checks of error code returned by devm_ioremap_resource()
crypto: atmel - fix checks of error code returned by devm_ioremap_resource()
crypto: qat - Change the definition of icp_qat_uof_regtype
hwrng: exynos - use __maybe_unused to hide pm functions
crypto: ccp - Add abstraction for device-specific calls
crypto: ccp - CCP versioning support
crypto: ccp - Support for multiple CCPs
crypto: ccp - Remove check for x86 family and model
crypto: ccp - memset request context to zero during import
lib/mpi: use "static inline" instead of "extern inline"
lib/mpi: avoid assembler warning
hwrng: bcm63xx - fix non device tree compatibility
crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode.
crypto: qat - The AE id should be less than the maximal AE number
lib/mpi: Endianness fix
crypto: rockchip - add hash support for crypto engine in rk3288
crypto: xts - fix compile errors
crypto: doc - add skcipher API documentation
crypto: doc - update AEAD AD handling
...
These patches include several bugfixes and cleanups for the NFSoRDMA client.
This includes bugfixes for NFS v4.1, proper RDMA_ERROR handling, and fixes
from the recent workqueue swicchover. These patches also switch xprtrdma to
use the new CQ API
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJW5wl+AAoJENfLVL+wpUDrH7sP/i3lZ7n6Pr+Flrb/Z+ywjmEA
mvZ0u3O3eviFxDqr82J/WL7fgDUGbBd9urtUu5xZGscc0HNs3LQ8izm6Dy3gmrVF
MFh69jNGL5djmfymYMWRbdoKLuOw4V70EBtGnYqH7Bh7gwpiIl3EiVcBBJup/vKV
rgvx6NnSGYpYPYpBFC4Ql4qZx7m5j4cxsThRScbu7wMMjDknls+7ZDM1B5mGO00+
1vYObpYGXOXovGZyHAHspQityWp6jvUcEMnJzWbMUFDqxkOmmGK3t54MfvRXZiFE
vUkgg5nGxhMejEIfMywuf6czKGfXc4HZT2yF9eSZeA4IW+7QkeeXeCcIANBDY6Ga
oKXu4fgM6T7SnCpefwkXRhinwtEM04tGAlxo1X3UcKrMz7Q3di3/NtgaWijfL8gy
9Nd1lt2kqI375h7+OZccURl33lnQBSjO4F2pt/pFk6wYwGh0F8co9bIp6QIEVQ0f
N2l8KU9vgLofhrD9JzZeu1l3+TCdDU9YaJLSjbkZ71BTjNtkNdUcd9Tk+bMSxept
mq7mNKe1oBHAGgR8+7zYBUKEt85rdpNovPoU+Hz/QKbV3zikGSPmL3e9gpEvhmH4
MKD7Vs61Fi8volfv7wHmzZF8Tk68qc1MAZXSbgzVQxwF/uBjqEuSO+n4Kii/gfkG
MJjlzqmqlO01O1fn2XHu
=aHja
-----END PGP SIGNATURE-----
Merge tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma
NFS: NFSoRDMA Client Side Changes
These patches include several bugfixes and cleanups for the NFSoRDMA client.
This includes bugfixes for NFS v4.1, proper RDMA_ERROR handling, and fixes
from the recent workqueue swicchover. These patches also switch xprtrdma to
use the new CQ API
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (787 commits)
xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs
xprtrdma: Use an anonymous union in struct rpcrdma_mw
xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs
xprtrdma: Serialize credit accounting again
xprtrdma: Properly handle RDMA_ERROR replies
rpcrdma: Add RPCRDMA_HDRLEN_ERR
xprtrdma: Do not wait if ib_post_send() fails
xprtrdma: Segment head and tail XDR buffers on page boundaries
xprtrdma: Clean up dprintk format string containing a newline
xprtrdma: Clean up physical_op_map()
xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro
Calling ib_poll_cq() to sort through WCs during a completion is a
common pattern amongst RDMA consumers. Since commit 14d3a3b249
("IB: add a proper completion queue abstraction"), WC sorting can
be handled by the IB core.
By converting to this new API, xprtrdma is made a better neighbor to
other RDMA consumers, as it allows the core to schedule the delivery
of completions more fairly amongst all active consumers.
Because each ib_cqe carries a pointer to a completion method, the
core can now post its own operations on a consumer's QP, and handle
the completions itself, without changes to the consumer.
Send completions were previously handled entirely in the completion
upcall handler (ie, deferring to a process context is unneeded).
Thus IB_POLL_SOFTIRQ is a direct replacement for the current
xprtrdma send code path.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Calling ib_poll_cq() to sort through WCs during a completion is a
common pattern amongst RDMA consumers. Since commit 14d3a3b249
("IB: add a proper completion queue abstraction"), WC sorting can
be handled by the IB core.
By converting to this new API, xprtrdma is made a better neighbor to
other RDMA consumers, as it allows the core to schedule the delivery
of completions more fairly amongst all active consumers.
Because each ib_cqe carries a pointer to a completion method, the
core can now post its own operations on a consumer's QP, and handle
the completions itself, without changes to the consumer.
xprtrdma's reply processing is already handled in a work queue, but
there is some initial order-dependent processing that is done in the
soft IRQ context before a work item is scheduled.
IB_POLL_SOFTIRQ is a direct replacement for the current xprtrdma
receive code path.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Commit fe97b47cd6 ("xprtrdma: Use workqueue to process RPC/RDMA
replies") replaced the reply tasklet with a workqueue that allows
RPC replies to be processed in parallel. Thus the credit values in
RPC-over-RDMA replies can be applied in a different order than in
which the server sent them.
To fix this, revert commit eba8ff660b ("xprtrdma: Move credit
update to RPC reply handler"). Reverting is done by hand to
accommodate code changes that have occurred since then.
Fixes: fe97b47cd6 ("xprtrdma: Use workqueue to process . . .")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
These are shorter than RPCRDMA_HDRLEN_MIN, and they need to
complete the waiting RPC.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If ib_post_send() in ro_unmap_sync() fails, the WRs have not been
posted, no completions will fire, and wait_for_completion() will
wait forever. Skip the wait in that case.
To ensure the MRs are invalid, disconnect.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
A single memory allocation is used for the pair of buffers wherein
the RPC client builds an RPC call message and decodes its matching
reply. These buffers are sized based on the maximum possible size
of the RPC call and reply messages for the operation in progress.
This means that as the call buffer increases in size, the start of
the reply buffer is pushed farther into the memory allocation.
RPC requests are growing in size. It used to be that both the call
and reply buffers fit inside a single page.
But these days, thanks to NFSv4 (and especially security labels in
NFSv4.2) the maximum call and reply sizes are large. NFSv4.0 OPEN,
for example, now requires a 6KB allocation for a pair of call and
reply buffers, and NFSv4 LOOKUP is not far behind.
As the maximum size of a call increases, the reply buffer is pushed
far enough into the buffer's memory allocation that a page boundary
can appear in the middle of it.
When the maximum possible reply size is larger than the client's
RDMA receive buffers (currently 1KB), the client has to register a
Reply chunk for the server to RDMA Write the reply into.
The logic in rpcrdma_convert_iovs() assumes that xdr_buf head and
tail buffers would always be contained on a single page. It supplies
just one segment for the head and one for the tail.
FMR, for example, registers up to a page boundary (only a portion of
the reply buffer in the OPEN case above). But without additional
segments, it doesn't register the rest of the buffer.
When the server tries to write the OPEN reply, the RDMA Write fails
with a remote access error since the client registered only part of
the Reply chunk.
rpcrdma_convert_iovs() must split the XDR buffer into multiple
segments, each of which are guaranteed not to contain a page
boundary. That way fmr_op_map is given the proper number of segments
to register the whole reply buffer.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
physical_op_unmap{_sync} don't use mr_nsegs, so don't bother to set
it in physical_op_map.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Calling ib_poll_cq() to sort through WCs during a completion is a
common pattern amongst RDMA consumers. Since commit 14d3a3b249
("IB: add a proper completion queue abstraction"), WC sorting can
be handled by the IB core.
By converting to this new API, svcrdma is made a better neighbor to
other RDMA consumers, as it allows the core to schedule the delivery
of completions more fairly amongst all active consumers.
This new API also aims each completion at a function that is
specific to the WR's opcode. Thus the ctxt->wr_op field and the
switch in process_context is replaced by a set of methods that
handle each completion type.
Because each ib_cqe carries a pointer to a completion method, the
core can now post operations on a consumer's QP, and handle the
completions itself.
The server's rdma_stat_sq_poll and rdma_stat_sq_prod metrics are no
longer updated.
As a clean up, the cq_event_handler, the dto_tasklet, and all
associated locking is removed, as they are no longer referenced or
used.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Calling ib_poll_cq() to sort through WCs during a completion is a
common pattern amongst RDMA consumers. Since commit 14d3a3b249
("IB: add a proper completion queue abstraction"), WC sorting can
be handled by the IB core.
By converting to this new API, svcrdma is made a better neighbor to
other RDMA consumers, as it allows the core to schedule the delivery
of completions more fairly amongst all active consumers.
Because each ib_cqe carries a pointer to a completion method, the
core can now post operations on a consumer's QP, and handle the
completions itself.
svcrdma receive completions no longer use the dto_tasklet. Each
polled Receive WC is now handled individually in soft IRQ context.
The server transport's rdma_stat_rq_poll and rdma_stat_rq_prod
metrics are no longer updated.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Clean up: close_out is reached only when ctxt == NULL and XPT_CLOSE
is already set.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Tested-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
RFC 5666 Section 4.2 states:
> When the peer detects an RPC-over-RDMA header version that it does
> not support (currently this document defines only version 1), it
> replies with an error code of ERR_VERS, and provides the low and
> high inclusive version numbers it does, in fact, support.
And:
> When other decoding errors are detected in the header or chunks,
> either an RPC decode error MAY be returned or the RPC/RDMA error
> code ERR_CHUNK MUST be returned.
The Linux NFS server does throw ERR_VERS when a client sends it
a request whose rdma_version is not "one." But it does not return
ERR_CHUNK when a header decoding error occurs. It just drops the
request.
To improve protocol extensibility, it should reject invalid values
in the rdma_proc field instead of treating them all like RDMA_MSG.
Otherwise clients can't detect when the server doesn't support
new rdma_proc values.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Tested-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When constructing an error reply, svc_rdma_xdr_encode_error()
needs to view the client's request message so it can get the
failing request's XID.
svc_rdma_xdr_decode_req() is supposed to return a pointer to the
client's request header. But if it fails to decode the client's
message (and thus an error reply is needed) it does not return the
pointer. The server then sends a bogus XID in the error reply.
Instead, unconditionally generate the pointer to the client's header
in svc_rdma_recvfrom(), and pass that pointer to both functions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Tested-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Fix several issues with svc_rdma_send_error():
- Post a receive buffer to replace the one that was consumed by
the incoming request
- Posting a send should use DMA_TO_DEVICE, not DMA_FROM_DEVICE
- No need to put_page _and_ free pages in svc_rdma_put_context
- Make sure the sge is set up completely in case the error
path goes through svc_rdma_unmap_dma()
- Replace the use of ENOSYS, which has a reserved meaning
Related fixes in svc_rdma_recvfrom():
- Don't leak the ctxt associated with the incoming request
- Don't close the connection after sending an error reply
- Let svc_rdma_send_error() figure out the right header error code
As a last clean up, move svc_rdma_send_error() to svc_rdma_sendto.c
with other similar functions. There is some common logic in these
functions that could someday be combined to reduce code duplication.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Tested-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Clean up: Most svc_rdma_post_recv() call sites close the transport
connection when a receive cannot be posted. Wrap that in a common
helper.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Tested-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The maximum size of a backchannel message on RPC-over-RDMA depends
on the connection's inline threshold. Today that threshold is
typically 1024 bytes, making the maximum message size 996 bytes.
The Linux server's CREATE_SESSION operation checks that the size
of callback Calls can be as large as 1044 bytes, to accommodate
RPCSEC_GSS. Thus CREATE_SESSION fails if a client advertises the
true message size maximum of 996 bytes.
But the server's backchannel currently does not support RPCSEC_GSS.
The actual maximum size it needs is much smaller. It is safe to
reduce the limit to enable NFSv4.1 on RDMA backchannel operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The NFS server's XDR encoders adds an XDR pad for content in the
xdr_buf page list at the beginning of the xdr_buf's tail buffer.
On RDMA transports, Write chunks are sent separately and without an
XDR pad.
If a Write chunk is being sent, strip off the pad in the tail buffer
so that inline content following the Write chunk remains XDR-aligned
when it is sent to the client.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When the Linux NFS server writes an odd-length data item into a
Write chunk, it finishes with XDR pad bytes. If the data item is
smaller than the Write chunk, the pad bytes are written at the end
of the data item, but still inside the chunk (ie, in the
application's buffer). Since this is direct data placement, that
exposes the pad bytes.
XDR pad bytes are inserted in order to preserve the XDR alignment
of the next XDR data item in an XDR stream. But Write chunks do not
appear in the payload XDR stream, and only one data item is allowed
in each chunk. Thus XDR padding is not needed in a Write chunk.
With NFSv4, the Linux NFS server places the results of any
operations that follow an NFSv4 READ or READLINK in the xdr_buf's
tail. Those results also should never be sent as a part of a Write
chunk. The current logic in send_write_chunks() appears to assume
that the xdr_buf's tail contains only pad bytes (ie, NFSv3).
The server should write only the contents of the xdr_buf's page list
in a Write chunk. If there's more than an XDR pad in the tail, that
needs to go inline or in the Reply chunk.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The client provides the location of Write chunks into which the
server writes bulk payload. The client provides these when the
Upper Layer Protocol wants direct data placement and the Binding
allows it. (For NFS, this is READ and READLINK operations).
The client also provides the location of a Reply chunk into which
the server writes the non-bulk part of an RPC reply. The client
provides this chunk whenever it believes the reply can be larger
than its receive buffers.
The server then uses the presence of these chunks to determine how
it will form its reply message.
svc_rdma_sendto() was looking for Write and Reply chunks multiple
times for every reply message. It would be more efficient to do it
just once.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
in case of unusually long writes to some system interfaces used by
mountd and other nfs support utilities.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWzlVJAAoJECebzXlCjuG+fVYQAMHQX2fx33juQF03r7R4A+cI
8QITgkAEDyQLcxYu/NdXqOXYc1Nv83uJRXljS2d2Y/D25spRB8ZaIZtpVFN/RlBc
8ss3nnI/iHFd6fqPECZfgHcORwalY+eYMICc2GdXFADYjyMQDD2/xIxF0GqW2RGC
xLKPPxnrcaIcgUGhFgbptpykKeYXHTK4vb/xwjkgI6m+IISFULAG4Z1jfsbIX+4N
NcNtTumgVQ1lFmMrl36K1JXj/0PY9vPPyDG38nSbD0s6zsS1mm6ALCt9N+mPJdX8
yJAq1FL70HtsBD9/oqm/qg9qpJ1dSB3r867HTXlECp6+mYwNi56u3I0jTJKbqwmJ
4eYlgQr1TPVUoa2vzbt/PE4yuKRsbo52TP1cX8VF+/bEtg0R5sz+uSTQZE9hqJE/
Cfk1eXYJxwVY8BnBY5tgmiGwApN13l/1fnhdZltWWsMEZiYvU46JIhVWv/8zMaA7
XOKP1r7JkWXcbPPvI7/KOxi5O4kzs9Mr4kT6xULrAHEbElPak6MIoODwJOLOMh1X
z82HHvfEd3mwRhrDDg2v0MrdikRx+mtaB3zfNFFbXcHAChRCqCQ0OHxOCE9EJhCg
GVsRpRjzafcSW+NwiQrn8OpjKJ7cSaa4gFXFaikQMFHYaYgavpJVWanH3fX6iKWV
RRPufAXYsyRNFmTi4AOb
=ezck
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfix from Bruce Fields:
"One fix for a bug that could cause a NULL write past the end of a
buffer in case of unusually long writes to some system interfaces used
by mountd and other nfs support utilities"
* tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linux:
sunrpc/cache: fix off-by-one in qword_get()
The qword_get() function NUL-terminates its output buffer. If the input
string is in hex format \xXXXX... and the same length as the output
buffer, there is an off-by-one:
int qword_get(char **bpp, char *dest, int bufsize)
{
...
while (len < bufsize) {
...
*dest++ = (h << 4) | l;
len++;
}
...
*dest = '\0';
return len;
}
This patch ensures the NUL terminator doesn't fall outside the output
buffer.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* multipath:
NFS add callback_ops to nfs4_proc_bind_conn_to_session_callback
pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3
SUNRPC: Allow addition of new transports to a struct rpc_clnt
NFSv4.1: nfs4_proc_bind_conn_to_session must iterate over all connections
SUNRPC: Make NFS swap work with multipath
SUNRPC: Add a helper to apply a function to all the rpc_clnt's transports
SUNRPC: Allow caller to specify the transport to use
SUNRPC: Use the multipath iterator to assign a transport to each task
SUNRPC: Make rpc_clnt store the multipath iterators
SUNRPC: Add a structure to track multiple transports
SUNRPC: Make freeing of struct xprt rcu-safe
SUNRPC: Uninline xprt_get(); It isn't performance critical.
SUNRPC: Reorder rpc_task to put waitqueue related info in same cachelines
SUNRPC: Remove unused function rpc_task_reset_client
This patch fixes a bug where NFS v4.1 callbacks were returning
RPC_GARBAGE_ARGS to the server.
Signed-off-by: Anna Schumaker <Anna@OcarinaProject.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJWxO+oAAoJENfLVL+wpUDrTwMQAPb1meyxjiRAWbZJPF/GO0oE
R62K7y6Y7fJ8MtOXWO+cyhZ54sRaGflTm233KS9L9ICY0acLU0fcArO43QLDH5p9
hozOJso2C7Ti2AVcgpLRNAK//DyBe8zzSU4qNGdLGSccKJcl97oUuI+18QpeWS8p
4dbscyx6pYxwTBUrvihLSUWcONge+uKYHHOriZ/r/sno+4wHNY6e11xeseMqmHQx
RFZOvsZakIoXsBW1UEZZsSpY0GnOTTTWZlk0rOb5ZyaCU+/sZygxJSHOSri66v8h
vPp7zx2HUGGpeMNRXlRgWkiJ2B+pShJ7JH7kfq5ZyMoy1uEeMEw3cdVoG+/OSBaT
EjOWbuj2Npq30GJAdXYE9joQ+Pd2d3h8q2/qzucpQh89hoGcf5jwgYrdyUgzHOdA
Lnsvhv5Wm8QiKO8LxNtnNA607dsDI5FLus+ijNl0bVFT12MYS/2ge679armCh5tA
eTJCp0ARKzWsoVRCCkGiPjmGg9yCTP73uNm/EXr4HLKfzSIbrZLK1Aswe9NS1+/L
tLVJsxbjRKfm9Ovvs7fl0lB7S4fbMKW78PO+vUuNfaRbgtha7SexvTp4g3am0Plh
2JnOgUqeG6My5Q9z843iG7vBD5U7jMId8dD37IlkXNMeiav7xBo1MQIokUDfpuXo
pbzSYIpdnIiPkSvAU3L2
=7AXa
-----END PGP SIGNATURE-----
Merge tag 'nfs-rdma-4.5-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma
NFS: NFSoRDMA Client Bugfix
This patch fixes a bug where NFS v4.1 callbacks were returning
RPC_GARBAGE_ARGS to the server.
Signed-off-by: Anna Schumaker <Anna@OcarinaProject.net>
On Mon, 15 Feb 2016, Trond Myklebust wrote:
> Hi Scott,
>
> On Mon, Feb 15, 2016 at 2:28 PM, Scott Mayhew <smayhew@redhat.com> wrote:
> > md5 is disabled in fips mode, and attempting to import a gss context
> > using md5 while in fips mode will result in crypto_alg_mod_lookup()
> > returning -ENOENT, which will make its way back up to
> > gss_pipe_downcall(), where the BUG() is triggered. Handling the -ENOENT
> > allows for a more graceful failure.
> >
> > Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> > ---
> > net/sunrpc/auth_gss/auth_gss.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
> > index 799e65b..c30fc3b 100644
> > --- a/net/sunrpc/auth_gss/auth_gss.c
> > +++ b/net/sunrpc/auth_gss/auth_gss.c
> > @@ -737,6 +737,9 @@ gss_pipe_downcall(struct file *filp, const char __user *src, size_t mlen)
> > case -ENOSYS:
> > gss_msg->msg.errno = -EAGAIN;
> > break;
> > + case -ENOENT:
> > + gss_msg->msg.errno = -EPROTONOSUPPORT;
> > + break;
> > default:
> > printk(KERN_CRIT "%s: bad return from "
> > "gss_fill_context: %zd\n", __func__, err);
> > --
> > 2.4.3
> >
>
> Well debugged, but I unfortunately do have to ask if this patch is
> sufficient? In addition to -ENOENT, and -ENOMEM, it looks to me as if
> crypto_alg_mod_lookup() can also fail with -EINTR, -ETIMEDOUT, and
> -EAGAIN. Don't we also want to handle those?
You're right, I was focusing on the panic that I could easily reproduce.
I'm still not sure how I could trigger those other conditions.
>
> In fact, peering into the rats nest that is
> gss_import_sec_context_kerberos(), it looks as if that is just a tiny
> subset of all the errors that we might run into. Perhaps the right
> thing to do here is to get rid of the BUG() (but keep the above
> printk) and just return a generic error?
That sounds fine to me -- updated patch attached.
-Scott
>From d54c6b64a107a90a38cab97577de05f9a4625052 Mon Sep 17 00:00:00 2001
From: Scott Mayhew <smayhew@redhat.com>
Date: Mon, 15 Feb 2016 15:12:19 -0500
Subject: [PATCH] auth_gss: remove the BUG() from gss_pipe_downcall()
Instead return a generic error via gss_msg->msg.errno. None of the
errors returned by gss_fill_context() should necessarily trigger a
kernel panic.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Some NFSv4.1 OPEN requests were hanging waiting for the NFS server
to finish recalling delegations. Turns out that each NFSv4.1 CB
request on RDMA gets a GARBAGE_ARGS reply from the Linux client.
Commit 756b9b37cf added a line in bc_svc_process that
overwrites the incoming rq_rcv_buf's length with the value in
rq_private_buf.len. But rpcrdma_bc_receive_call() does not invoke
xprt_complete_bc_request(), thus rq_private_buf.len is not
initialized. svc_process_common() is invoked with a zero-length
RPC message, and fails.
Fixes: 756b9b37cf ('SUNRPC: Fix callback channel')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Add a function to allow creation and addition of a new transport
to an existing rpc_clnt
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Add a helper for tasks that require us to apply a function to all the
transports in an rpc_clnt.
An example of a usecase would be BIND_CONN_TO_SESSION, where we want
to send one RPC call down each transport.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
This is needed in order to allow the NFSv4.1 backchannel and
BIND_CONN_TO_SESSION function to work.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
This is a pre-patch for the RPC multipath code. It sets up the storage in
struct rpc_clnt for the multipath code.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
In order to support multipathing/trunking we will need the ability to
track multiple transports. This patch sets up a basic structure for
doing so.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
This patch replaces uses of blkcipher with skcipher and the long
obsolete hash interface with either shash (for non-SG users) and
ahash.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
- Remove usage of ib_query_device and instead store attributes in
ib_device struct
- Move iopoll out of block and into lib, rename to irqpoll, and use
in several places in the rdma stack as our new completion queue
polling library mechanism. Update the other block drivers that
already used iopoll to use the new mechanism too.
- Replace the per-entry GID table locks with a single GID table lock
- IPoIB multicast cleanup
- Cleanups to the IB MR facility
- Add support for 64bit extended IB counters
- Fix for netlink oops while parsing RDMA nl messages
- RoCEv2 support for the core IB code
- mlx4 RoCEv2 support
- mlx5 RoCEv2 support
- Cross Channel support for mlx5
- Timestamp support for mlx5
- Atomic support for mlx5
- Raw QP support for mlx5
- MAINTAINERS update for mlx4/mlx5
- Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates
- Add support for remote invalidate to the iSER driver (pushed through the
RDMA tree due to dependencies, acknowledged by nab)
- Update to NFSoRDMA (pushed through the RDMA tree due to dependencies,
acknowledged by Bruce)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWoSygAAoJELgmozMOVy/dDjsP/2vbTda2MvQfkfkGEZBQdJSg
095RN0gQgCJdg78lAl8yuaK8r4VN/7uefpDtFdudH1I/Pei7X0wxN9R1UzFNG4KR
AD53lz92IVPs15328SbPR2kvNWISR9aBFQo3rlElq3Grqlp0EMn2Ou1vtu87rekF
aMllxr8Nl0uZhP+eWusOsYpJUUtwirLgRnrAyfqo2UxZh/TMIroT0TCx1KXjVcAg
dhDARiZAdu3OgSc6OsWqmH+DELEq6dFVA5F+DDBGAb8bFZqlJc7cuMHWInwNsNXT
so4bnEQ835alTbsdYtqs5DUNS8heJTAJP4Uz0ehkTh/uNCcvnKeUTw1c2P/lXI1k
7s33gMM+0FXj0swMBw0kKwAF2d9Hhus9UAN7NwjBuOyHcjGRd5q7SAnfWkvKx000
s9jVW19slb2I38gB58nhjOh8s+vXUArgxnV1+kTia1+bJSR5swvVoWRicRXdF0vh
TvLX/BjbSIU73g1TnnLNYoBTV3ybFKQ6bVdQW7fzSTDs54dsI1vvdHXi3bYZCpnL
HVwQTZRfEzkvb0AdKbcvf8p/TlaAHem3ODqtO1eHvO4if1QJBSn+SptTEeJVYYdK
n4B3l/dMoBH4JXJUmEHB9jwAvYOpv/YLAFIvdL7NFwbqGNsC3nfXFcmkVORB1W3B
KEMcM2we4bz+uyKMjEAD
=5oO7
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"Initial roundup of 4.5 merge window patches
- Remove usage of ib_query_device and instead store attributes in
ib_device struct
- Move iopoll out of block and into lib, rename to irqpoll, and use
in several places in the rdma stack as our new completion queue
polling library mechanism. Update the other block drivers that
already used iopoll to use the new mechanism too.
- Replace the per-entry GID table locks with a single GID table lock
- IPoIB multicast cleanup
- Cleanups to the IB MR facility
- Add support for 64bit extended IB counters
- Fix for netlink oops while parsing RDMA nl messages
- RoCEv2 support for the core IB code
- mlx4 RoCEv2 support
- mlx5 RoCEv2 support
- Cross Channel support for mlx5
- Timestamp support for mlx5
- Atomic support for mlx5
- Raw QP support for mlx5
- MAINTAINERS update for mlx4/mlx5
- Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates
- Add support for remote invalidate to the iSER driver (pushed
through the RDMA tree due to dependencies, acknowledged by nab)
- Update to NFSoRDMA (pushed through the RDMA tree due to
dependencies, acknowledged by Bruce)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
IB/mlx5: Unify CQ create flags check
IB/mlx5: Expose Raw Packet QP to user space consumers
{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
IB/mlx5: Add Raw Packet QP query functionality
IB/mlx5: Add create and destroy functionality for Raw Packet QP
IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
IB/mlx5: Allocate a Transport Domain for each ucontext
net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
net/mlx5_core: Add RQ and SQ event handling
net/mlx5_core: Export transport objects
IB/mlx5: Expose CQE version to user-space
IB/mlx5: Add CQE version 1 support to user QPs and SRQs
IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
IB/sa: Fix netlink local service GFP crash
IB/srpt: Remove redundant wc array
IB/qib: Improve ipoib UD performance
IB/mlx4: Advertise RoCE v2 support
IB/mlx4: Create and use another QP1 for RoCEv2
IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
...
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).
Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We now alwasy have a per-PD local_dma_lkey available. Make use of that
fact in svc_rdma and stop registering our own MR.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
To support the server-side of an NFSv4.1 backchannel on RDMA
connections, add a transport class that enables backward
direction messages on an existing forward channel connection.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Extra resources for handling backchannel requests have to be
pre-allocated when a transport instance is created. Set up
additional fields in svcxprt_rdma to track these resources.
The max_requests fields are elements of the RPC-over-RDMA
protocol, so they should be u32. To ensure that unsigned
arithmetic is used everywhere, some other fields in the
svcxprt_rdma struct are updated.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pre-requisite to use map_xdr in the backchannel code.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Clean up.
These functions can otherwise fail, so check for page allocation
failures too.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>