An RPC can exit at any time. When it does so, xprt_rdma_free() is
called, and it calls ->op_unmap().
If ->ro_reset() is running due to a transport disconnect, the two
methods can race while processing the same rpcrdma_mw. The results
are unpredictable.
Because of this, in previous patches I've altered ->ro_map() to
handle MR reset. ->ro_reset() is no longer needed and can be
removed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Acquiring 64 MRs in rpcrdma_buffer_get() while holding the buffer
pool lock is expensive, and unnecessary because most modern adapters
can transfer 100s of KBs of payload using just a single MR.
Instead, acquire MRs one-at-a-time as chunks are registered, and
return them to rb_mws immediately during deregistration.
Note: commit 539431a437 ("xprtrdma: Don't invalidate FRMRs if
registration fails") is reverted: There is now a valid case where
registration can fail (with -ENOMEM) but the QP is still in RTS.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
After a transport disconnect, FRMRs can be left in an undetermined
state. In particular, the MR's rkey is no good.
Currently, FRMRs are fixed up by the transport connect worker, but
that can race with ->ro_unmap if an RPC happens to exit while the
transport connect worker is running.
A better way of dealing with broken FRMRs is to detect them before
they are re-used by ->ro_map. Such FRMRs are either already invalid
or are owned by the sending RPC, and thus no race with ->ro_unmap
is possible.
Introduce a mechanism for handing broken FRMRs to a workqueue to be
reset in a context that is appropriate for allocating resources
(ie. an ib_alloc_fast_reg_mr() API call).
This mechanism is not yet used, but will be in subsequent patches.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The connect worker can replace ri_id, but prevents ri_id->device
from changing during the lifetime of a transport instance. The old
ID is kept around until a new ID is created and the ->device is
confirmed to be the same.
Cache a copy of ri_id->device in rpcrdma_ia and in rpcrdma_rep.
The cached copy can be used safely in code that does not serialize
with the connect worker.
Other code can use it to save an extra address generation (one
pointer dereference instead of two).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Highlights include:
Stable patches:
- Fix a regression in /proc/self/mountstats
- Fix the pNFS flexfiles O_DIRECT support
- Fix high load average due to callback thread sleeping
Bugfixes:
- Various patches to fix the pNFS layoutcommit support
- Do not cache pNFS deviceids unless server notifications are enabled
- Fix a SUNRPC transport reconnection regression
- make debugfs file creation failure non-fatal in SUNRPC
- Another fix for circular directory warnings on NFSv4 "junctioned" mountpoints
- Fix locking around NFSv4.2 fallocate() support
- Truncating NFSv4 file opens should also sync O_DIRECT writes
- Prevent infinite loop in rpcrdma_ep_create()
Features:
- Various improvements to the RDMA transport code's handling of memory
registration
- Various code cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVOmT6AAoJEGcL54qWCgDyrhYQAMPKXB55jrdOR/7UVSF/xPML
7OjMGHvBnTn/y0pNIyLyS1PjTZZsD/WZjoW9EFGpTv727qQNVoFxFRLNUcgi3NoL
1YledCkLf7Q32aqod93SRRFPc9hzBoKhOZpOzBuWaAviyAB3KLi70DWAq9qRReYM
prXUQQjpW5FLU+B2ifaVc2RCnu/rZ2c02YdR2XdtkBaAJxuhB2vR8IY1evwjCv3R
5zyLDd9zSDDoArdpUzM97cxZPcYRSqbOwgTKvaaRnDDq/mKbKMZaqmEfjblwzNFt
b43FbveJzZ3hlPADIpmaiMHjRTbxWjIKc9K1sOF2FPfcuPe2yM3DMAxDegUkEveS
7fkbv/qRZ30NqfchGanX/pmBlLOcdI76qe/bwhN19wCnw48O1eeHi1HK8rWGhU+E
qcrRZ3ZS2ufP/MVBuhauy0qU9Q4wcEtm7NGGP1231ZtmfjHKyBa4pLirNfG1AlJt
dK7tBrknVx+WVm/UddJp/fEsxbP0+fki6TwzioHUSWcz8rDVYF6PFT/QPM54SX2h
0oqwvu6d/uShpkVRm+fbje8FHmUxKdgqDsCYX2fNjWskh1oXSPsItvjqmTmTlE0i
EBmBwVwI0uB1ZQ3PrJLadhRcO3ZJmLQ5gNj456dstvWy6UQds1xyIQ/DgvmlzxWO
E9t0l18xHGRwbndsDa8f
=j5dP
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Another set of mainly bugfixes and a couple of cleanups. No new
functionality in this round.
Highlights include:
Stable patches:
- Fix a regression in /proc/self/mountstats
- Fix the pNFS flexfiles O_DIRECT support
- Fix high load average due to callback thread sleeping
Bugfixes:
- Various patches to fix the pNFS layoutcommit support
- Do not cache pNFS deviceids unless server notifications are enabled
- Fix a SUNRPC transport reconnection regression
- make debugfs file creation failure non-fatal in SUNRPC
- Another fix for circular directory warnings on NFSv4 "junctioned"
mountpoints
- Fix locking around NFSv4.2 fallocate() support
- Truncating NFSv4 file opens should also sync O_DIRECT writes
- Prevent infinite loop in rpcrdma_ep_create()
Features:
- Various improvements to the RDMA transport code's handling of
memory registration
- Various code cleanups"
* tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits)
fs/nfs: fix new compiler warning about boolean in switch
nfs: Remove unneeded casts in nfs
NFS: Don't attempt to decode missing directory entries
Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one"
NFS: Rename idmap.c to nfs4idmap.c
NFS: Move nfs_idmap.h into fs/nfs/
NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h
NFS: Add a stub for GETDEVICELIST
nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes
nfs: fix DIO good bytes calculation
nfs: Fetch MOUNTED_ON_FILEID when updating an inode
sunrpc: make debugfs file creation failure non-fatal
nfs: fix high load average due to callback thread sleeping
NFS: Reduce time spent holding the i_mutex during fallocate()
NFS: Don't zap caches on fallocate()
xprtrdma: Make rpcrdma_{un}map_one() into inline functions
xprtrdma: Handle non-SEND completions via a callout
xprtrdma: Add "open" memreg op
xprtrdma: Add "destroy MRs" memreg op
xprtrdma: Add "reset MRs" memreg op
...
These functions are called in a loop for each page transferred via
RDMA READ or WRITE. Extract loop invariants and inline them to
reduce CPU overhead.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Allow each memory registration mode to plug in a callout that handles
the completion of a memory registration operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The open op determines the size of various transport data structures
based on device capabilities and memory registration mode.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Memory Region objects associated with a transport instance are
destroyed before the instance is shutdown and destroyed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
This method is invoked when a transport instance is about to be
reconnected. Each Memory Region object is reset to its initial
state.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
This method is used when setting up a new transport instance to
create a pool of Memory Region objects that will be used to register
memory during operation.
Memory Regions are not needed for "physical" registration, since
->prepare and ->release are no-ops for that mode.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
There is very little common processing among the different external
memory deregistration functions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
There is very little common processing among the different external
memory registration functions. Have rpcrdma_create_chunks() call
the registration method directly. This removes a stack frame and a
switch statement from the external registration path.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The max_payload computation is generalized to ensure that the
payload maximum is the lesser of RPC_MAX_DATA_SEGS and the number of
data segments that can be transmitted in an inline buffer.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Instead of employing switch() statements, let's use the typical
Linux kernel idiom for handling behavioral variation: virtual
functions.
Start by defining a vector of operations for each supported memory
registration mode, and by adding a source file for each mode.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>