Commit Graph

877 Commits

Author SHA1 Message Date
J. Bruce Fields 39a9beab5a rpc: share one xps between all backchannels
The spec allows backchannels for multiple clients to share the same tcp
connection.  When that happens, we need to use the same xprt for all of
them.  Similarly, we need the same xps.

This fixes list corruption introduced by the multipath code.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Trond Myklebust <trondmy@primarydata.com>
2016-06-15 10:32:25 -04:00
J. Bruce Fields d50039ea5e nfsd4/rpc: move backchannel create logic into rpc code
Also simplify the logic a bit.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Trond Myklebust <trondmy@primarydata.com>
2016-06-15 10:32:25 -04:00
Linus Torvalds 7e0fb73c52 Merge branch 'hash' of git://ftp.sciencehorizons.net/linux
Pull string hash improvements from George Spelvin:
 "This series does several related things:

   - Makes the dcache hash (fs/namei.c) useful for general kernel use.

     (Thanks to Bruce for noticing the zero-length corner case)

   - Converts the string hashes in <linux/sunrpc/svcauth.h> to use the
     above.

   - Avoids 64-bit multiplies in hash_64() on 32-bit platforms.  Two
     32-bit multiplies will do well enough.

   - Rids the world of the bad hash multipliers in hash_32.

     This finishes the job started in commit 689de1d6ca ("Minimal
     fix-up of bad hashing behavior of hash_64()")

     The vast majority of Linux architectures have hardware support for
     32x32-bit multiply and so derive no benefit from "simplified"
     multipliers.

     The few processors that do not (68000, h8/300 and some models of
     Microblaze) have arch-specific implementations added.  Those
     patches are last in the series.

   - Overhauls the dcache hash mixing.

     The patch in commit 0fed3ac866 ("namei: Improve hash mixing if
     CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
     Replaced with a much more careful design that's simultaneously
     faster and better.  (My own invention, as there was noting suitable
     in the literature I could find.  Comments welcome!)

   - Modify the hash_name() loop to skip the initial HASH_MIX().  This
     would let us salt the hash if we ever wanted to.

   - Sort out partial_name_hash().

     The hash function is declared as using a long state, even though
     it's truncated to 32 bits at the end and the extra internal state
     contributes nothing to the result.  And some callers do odd things:

      - fs/hfs/string.c only allocates 32 bits of state
      - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

   - Modify bytemask_from_count to handle inputs of 1..sizeof(long)
     rather than 0..sizeof(long)-1.  This would simplify users other
     than full_name_hash"

  Special thanks to Bruce Fields for testing and finding bugs in v1.  (I
  learned some humbling lessons about "obviously correct" code.)

  On the arch-specific front, the m68k assembly has been tested in a
  standalone test harness, I've been in contact with the Microblaze
  maintainers who mostly don't care, as the hardware multiplier is never
  omitted in real-world applications, and I haven't heard anything from
  the H8/300 world"

* 'hash' of git://ftp.sciencehorizons.net/linux:
  h8300: Add <asm/hash.h>
  microblaze: Add <asm/hash.h>
  m68k: Add <asm/hash.h>
  <linux/hash.h>: Add support for architecture-specific functions
  fs/namei.c: Improve dcache hash function
  Eliminate bad hash multipliers from hash_32() and  hash_64()
  Change hash_64() return value to 32 bits
  <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()
  fs/namei.c: Add hashlen_string() function
  Pull out string hash to <linux/stringhash.h>
2016-05-28 16:15:25 -07:00
George Spelvin 917ea166f4 <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()
Finally, the first use of previous two patches: eliminate the
separate ad-hoc string hash functions in the sunrpc code.

Now hash_str() is a wrapper around hash_string(), and hash_mem() is
likewise a wrapper around full_name_hash().

Note that sunrpc code *does* call hash_mem() with a zero length, which
is why the previous patch needed to handle that in full_name_hash().
(Thanks, Bruce, for finding that!)

This also eliminates the only caller of hash_long which asks for
more than 32 bits of output.

The comment about the quality of hashlen_string() and full_name_hash()
is jumping the gun by a few patches; they aren't very impressive now,
but will be improved greatly later in the series.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: linux-nfs@vger.kernel.org
2016-05-28 15:42:50 -04:00
Linus Torvalds ea8ea737c4 NFS client updates for Linux 4.7
Highlights include:
 
 Features:
 - Add support for the NFS v4.2 COPY operation
 - Add support for NFS/RDMA over IPv6
 
 Bugfixes and cleanups:
 - Avoid race that crashes nfs_init_commit()
 - Fix oops in callback path
 - Fix LOCK/OPEN race when unlinking an open file
 - Choose correct stateids when using delegations in setattr, read and write
 - Don't send empty SETATTR after OPEN_CREATE
 - xprtrdma: Prevent server from writing a reply into memory client has released
 - xprtrdma: Support using Read list and Reply chunk in one RPC call
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXRu76AAoJENfLVL+wpUDrDVoQAKPKv1tEVJMRUQA3UVoKoixd
 KjmmZMjl6GfpISwTZl+a8W549jyGuYH7Gl8vSbMaE9/FI+kJW6XZQniTYfFqY8/a
 LbMSdNx1+yURisbkyO0vPqqwKw9r6UmsfGeUT8SpS3ff61yp4Oj436ra2qcPJsZ3
 cWl/lHItzX7oKFAWmr0Nmq2X8ac/8+NFyK29+V/QGfwtp3qAPbpA8XM5HrHw3rA2
 uk5uNSr3hwqz7P3+Hi7ZoO2m4nQTAbQnEunfYpxlOwz4IaM7qcGnntT6Jhwq1pGE
 /1YasG7bHeiWjhynmZZ4CWuMkogau2UJ/G68Cz7ehLhPNr8rH/ZFCJZ+XX0e0CgI
 1d+AwxZvgszIQVBY3S7sg8ezVSCPBXRFJ8rtzggGscqC53aP7L+rLfUFH+OKrhMg
 6n7RQiq4EmGDJGviB/R2HixI9CpdOf2puNhDKSJmPOqiSS7UuHMw8QCq++vdru+1
 GLGunGyO7D70yTV92KtsdzJlFlnfa/g+FIJrmaMpL3HH1h0stTctWX5xlTYmqEL3
 z3aUuT8RySk2t1FTabSj6KRWqE/krK5BMZbX91kpF27WL4c/olXFaZPqBDsj0q4u
 2rm1fIrc8RxLXctJan9ro092s/e9dup/1JxV5XWMq/EGS1ezvf+0XkCOtURaAWp3
 2aPHlx7M8iuq2SouL6f7
 =QMmY
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Highlights include:

  Features:
   - Add support for the NFS v4.2 COPY operation
   - Add support for NFS/RDMA over IPv6

  Bugfixes and cleanups:
   - Avoid race that crashes nfs_init_commit()
   - Fix oops in callback path
   - Fix LOCK/OPEN race when unlinking an open file
   - Choose correct stateids when using delegations in setattr, read and
     write
   - Don't send empty SETATTR after OPEN_CREATE
   - xprtrdma: Prevent server from writing a reply into memory client
     has released
   - xprtrdma: Support using Read list and Reply chunk in one RPC call"

* tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (61 commits)
  pnfs: pnfs_update_layout needs to consider if strict iomode checking is on
  nfs/flexfiles: Use the layout segment for reading unless it a IOMODE_RW and reading is disabled
  nfs/flexfiles: Helper function to detect FF_FLAGS_NO_READ_IO
  nfs: avoid race that crashes nfs_init_commit
  NFS: checking for NULL instead of IS_ERR() in nfs_commit_file()
  pnfs: make pnfs_layout_process more robust
  pnfs: rework LAYOUTGET retry handling
  pnfs: lift retry logic from send_layoutget to pnfs_update_layout
  pnfs: fix bad error handling in send_layoutget
  flexfiles: add kerneldoc header to nfs4_ff_layout_prepare_ds
  flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTED
  pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args
  pnfs: keep track of the return sequence number in pnfs_layout_hdr
  pnfs: record sequence in pnfs_layout_segment when it's created
  pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit set
  pNFS/flexfiles: When initing reads or writes, we might have to retry connecting to DSes
  pNFS/flexfiles: When checking for available DSes, conditionally check for MDS io
  pNFS/flexfile: Fix erroneous fall back to read/write through the MDS
  NFS: Reclaim writes via writepage are opportunistic
  NFSv4: Use the right stateid for delegations in setattr, read and write
  ...
2016-05-26 10:33:33 -07:00
Trond Myklebust 9a8f6b5ea2 SUNRPC: Ensure get_rpccred() and put_rpccred() can take NULL arguments
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:48:06 -04:00
Chuck Lever 29c554227a xprtrdma: Bound the inline threshold values
Currently the sysctls that allow setting the inline threshold allow
any value to be set.

Small values only make the transport run slower. The default 1KB
setting is as low as is reasonable. And the logic that decides how
to divide a Send buffer between RPC-over-RDMA header and RPC message
assumes (but does not check) that the lower bound is not crazy (say,
57 bytes).

Send and receive buffers share a page with some control information.
Values larger than about 3KB can't be supported, currently.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:47:57 -04:00
Chuck Lever 6b26cc8c8e sunrpc: Advertise maximum backchannel payload size
RPC-over-RDMA transports have a limit on how large a backward
direction (backchannel) RPC message can be. Ensure that the NFSv4.x
CREATE_SESSION operation advertises this limit to servers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:47:57 -04:00
Chuck Lever 4b9c7f9db9 sunrpc: Update RPCBIND_MAXNETIDLEN
Commit 176e21ee2e ("SUNRPC: Support for RPC over AF_LOCAL
transports") added a 5-character netid, but did not bump
RPCBIND_MAXNETIDLEN from 4 to 5.

Fixes: 176e21ee2e ("SUNRPC: Support for RPC over AF_LOCAL ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17 15:47:56 -04:00
Chuck Lever d9e4084f6c svcrdma: Generalize svc_rdma_xdr_decode_req()
Clean up: Pass in just the piece of the svc_rqst that is needed
here.

While we're in the area, add an informative documenting comment.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-05-13 15:53:06 -04:00
Jeff Layton 62dbef2ae4 sunrpc: add a get_rpccred_rcu inline
Sometimes we might have a RCU managed credential pointer and don't want
to use locking to handle it. Add a function that will take a reference
to the cred iff the refcount is not already zero. Callers can dereference
the pointer under the rcu_read_lock and use that function to take a
reference only if the cred is not on its way to destruction.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09 09:05:40 -04:00
Weston Andros Adamson c065d229e3 sunrpc: add rpc_lookup_generic_cred
Add function rpc_lookup_generic_cred, which allows lookups of a generic
credential that's not current_cred().

[jlayton: add gfp_t parm]

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09 09:05:40 -04:00
Jeff Layton 3c6e0bc8a1 sunrpc: plumb gfp_t parm into crcreate operation
We need to be able to call the generic_cred creator from different
contexts. Add a gfp_t parm to the crcreate operation and to
rpcauth_lookup_credcache. For now, we just push the gfp_t parms up
one level to the *_lookup_cred functions.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09 09:05:40 -04:00
Kirill A. Shutemov ea1754a084 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage
Mostly direct substitution with occasional adjustment or removing
outdated comments.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Linus Torvalds 5b1e167d8d Various bugfixes, a RDMA update from Chuck Lever, and support for a new
pnfs layout type from Christoph Hellwig.  The new layout type is a
 variant of the block layout which uses SCSI features to offer improved
 fencing and device identification.
 
 (Also: note this pull request also includes the client side of SCSI
 layout, with Trond's permission.)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW8+uhAAoJECebzXlCjuG+26YP/35DP4MPfszEJ5G0dYq5HMwl
 dJUni8ajSHRswZ/2FqiBsRwmg3Djfc+uoXdOneD1f6ogkDe7S16yp+FRyh8/VwUs
 Ym6LcxSjT28uqkxO0MblcnUl0G9nNSuOwqIsZ0HG7/UC7E6RmCF4o3r5fFUfOsA+
 B3koB5UcHNAFythAk+GDwOQ46Fr96VkZ7Y+OhdNAwmeXZIdKXIufweueI/o2uipB
 RoJFJ7lqrzAjFe+CqAUBr2l2k6lEKzdxbEH6HXQ5+cvVNwfVIgnrONpF78uF/p9T
 NNDnZ+fn3YdRhd+W9RxUHZq7ZL5YOEA8kHsAlloeBH74GqCy7IcS+DrKt1ReM3px
 bhgsXM3dqqJ9xiDGqmeE4VQwRF30SxgYZbO386E+cLHnCYV+vfY6RUaWPrk6On/r
 FL9g3iyVvhyC4HO06Xm+uvvERw8R+fTZY9KZQKH2RL0Tr5DkWRRNJfasMO+PwGOv
 Fdku01vyoA4Y6mbqUgQ9DmrbLO4gK3UyMiOTanQV9shrIDxI0MOuLK03zL25vZCM
 s1A4YBpXmg4gx3XsOFM+tygv6EVujDu6scICeb+hj0vi0oG82Lx7T9e3MJEiYC+T
 jbi8bu+x+0bX2obMprvDNVUzi/PgSUVpGCnRlbRTaXBa0lB6nV7uUiQ1HC9gGesm
 ZWWiOv7du+7WlFP5c6r5
 =mY8w
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Various bugfixes, a RDMA update from Chuck Lever, and support for a
  new pnfs layout type from Christoph Hellwig.  The new layout type is a
  variant of the block layout which uses SCSI features to offer improved
  fencing and device identification.

  (Also: note this pull request also includes the client side of SCSI
  layout, with Trond's permission.)"

* tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux:
  sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
  nfsd: recover: fix memory leak
  nfsd: fix deadlock secinfo+readdir compound
  nfsd4: resfh unused in nfsd4_secinfo
  svcrdma: Use new CQ API for RPC-over-RDMA server send CQs
  svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs
  svcrdma: Remove close_out exit path
  svcrdma: Hook up the logic to return ERR_CHUNK
  svcrdma: Use correct XID in error replies
  svcrdma: Make RDMA_ERROR messages work
  rpcrdma: Add RPCRDMA_HDRLEN_ERR
  svcrdma: svc_rdma_post_recv() should close connection on error
  svcrdma: Close connection when a send error occurs
  nfsd: Lower NFSv4.1 callback message size limit
  svcrdma: Do not send Write chunk XDR pad with inline content
  svcrdma: Do not write xdr_buf::tail in a Write chunk
  svcrdma: Find client-provided write and reply chunks once per reply
  nfsd: Update NFS server comments related to RDMA support
  nfsd: Fix a memory leak when meeting unsupported state_protect_how4
  nfsd4: fix bad bounds checking
2016-03-24 10:41:00 -07:00
Linus Torvalds 01cde1538e NFS client updates for Linux 4.6
Highlights include:
 
 Features:
 - Add support for multiple NFSv4.1 callbacks in flight
 - Initial patchset for RPC multipath support
 - Adapt RPC/RDMA to use the new completion queue API
 
 Bugfixes and cleanups:
 - nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed
 - Cleanups to remove nfs_inode_dio_wait and nfs4_file_fsync
 - Fix RPC/RDMA credit accounting
 - Properly handle RDMA_ERROR replies
 - xprtrdma: Do not wait if ib_post_send() fails
 - xprtrdma: Segment head and tail XDR buffers on page boundaries
 - xprtrdma cleanups for dprintk, physical_op_map and unused macros
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW8Y7MAAoJEGcL54qWCgDyMsMP+we8JSgfVqI5X1lKpU9aPWkI
 D912ybtV58Kv0elKwYvQMqm+mRvdMNz1hZNJa4sAEaPVBOGfFjyZLy3xlDlr0HTf
 M+Juh0FNLTcUh1obxJamjsbpNxfg4b6f/Z29KWRzahv/MlpMJVS3hLjpAEzCcTYr
 WfOOovV6mragtsBINegGl/6jk/x2D22JDnKcTU+8ltVZGJtZe+HoqTFhUOrLO5qm
 wR3YO22fbOuiZxCPoST06kMNiksYnYXnOju8RwlKwFYq3bWke0jWstQtIC4vKs6K
 4u5o74aTBL5zMkJPnJuIfN2if4LJPptSr1n7pItbv3MLmgY1mWjE6N2BATpijfhQ
 p+Gt/GHTAvswuWrmwySZKLj/Q8EkBuw4ohPFwLQ9eFHl2USoV3G9KQw7H0odR4d1
 IyQPCag+suN2lWBreFkPIV48dZyeCVk6JmJmy3SN+d0L1t3jd6gwSO2UBgG7S2Gd
 qVbdxYRiU/zYP6E5wFouLhIc1beSfe4vnJqvnuWZrIId+haTE2+OLi7772WGIkSe
 xoZVTg7AX4Wu79ZyWoH+e9FnDvEsRkRVv7HQfpsMq2gynBWj70/KemEoeZnjqWaB
 UOWcH8/vNLrnwlXTh0VHG6I8t3s0EXgqQB4//tYRLI42oIj35W2pIMnjYt52DeVB
 Mo5mbYghtR9bgeoRQ6V4
 =kC3t
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:
   - Add support for multiple NFSv4.1 callbacks in flight
   - Initial patchset for RPC multipath support
   - Adapt RPC/RDMA to use the new completion queue API

  Bugfixes and cleanups:
   - nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed
   - Cleanups to remove nfs_inode_dio_wait and nfs4_file_fsync
   - Fix RPC/RDMA credit accounting
   - Properly handle RDMA_ERROR replies
   - xprtrdma: Do not wait if ib_post_send() fails
   - xprtrdma: Segment head and tail XDR buffers on page boundaries
   - xprtrdma cleanups for dprintk, physical_op_map and unused macros"

* tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (35 commits)
  nfs/blocklayout: make sure making a aligned read request
  nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed
  nfs: remove nfs_inode_dio_wait
  nfs: remove nfs4_file_fsync
  xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs
  xprtrdma: Use an anonymous union in struct rpcrdma_mw
  xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs
  xprtrdma: Serialize credit accounting again
  xprtrdma: Properly handle RDMA_ERROR replies
  rpcrdma: Add RPCRDMA_HDRLEN_ERR
  xprtrdma: Do not wait if ib_post_send() fails
  xprtrdma: Segment head and tail XDR buffers on page boundaries
  xprtrdma: Clean up dprintk format string containing a newline
  xprtrdma: Clean up physical_op_map()
  xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro
  NFS add callback_ops to nfs4_proc_bind_conn_to_session_callback
  pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3
  SUNRPC: Allow addition of new transports to a struct rpc_clnt
  NFSv4.1: nfs4_proc_bind_conn_to_session must iterate over all connections
  SUNRPC: Make NFS swap work with multipath
  ...
2016-03-22 13:16:21 -07:00
Trond Myklebust 1425075e72 NFS: NFSoRDMA Client Side Changes
These patches include several bugfixes and cleanups for the NFSoRDMA client.
 This includes bugfixes for NFS v4.1, proper RDMA_ERROR handling, and fixes
 from the recent workqueue swicchover.  These patches also switch xprtrdma to
 use the new CQ API
 
 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJW5wl+AAoJENfLVL+wpUDrH7sP/i3lZ7n6Pr+Flrb/Z+ywjmEA
 mvZ0u3O3eviFxDqr82J/WL7fgDUGbBd9urtUu5xZGscc0HNs3LQ8izm6Dy3gmrVF
 MFh69jNGL5djmfymYMWRbdoKLuOw4V70EBtGnYqH7Bh7gwpiIl3EiVcBBJup/vKV
 rgvx6NnSGYpYPYpBFC4Ql4qZx7m5j4cxsThRScbu7wMMjDknls+7ZDM1B5mGO00+
 1vYObpYGXOXovGZyHAHspQityWp6jvUcEMnJzWbMUFDqxkOmmGK3t54MfvRXZiFE
 vUkgg5nGxhMejEIfMywuf6czKGfXc4HZT2yF9eSZeA4IW+7QkeeXeCcIANBDY6Ga
 oKXu4fgM6T7SnCpefwkXRhinwtEM04tGAlxo1X3UcKrMz7Q3di3/NtgaWijfL8gy
 9Nd1lt2kqI375h7+OZccURl33lnQBSjO4F2pt/pFk6wYwGh0F8co9bIp6QIEVQ0f
 N2l8KU9vgLofhrD9JzZeu1l3+TCdDU9YaJLSjbkZ71BTjNtkNdUcd9Tk+bMSxept
 mq7mNKe1oBHAGgR8+7zYBUKEt85rdpNovPoU+Hz/QKbV3zikGSPmL3e9gpEvhmH4
 MKD7Vs61Fi8volfv7wHmzZF8Tk68qc1MAZXSbgzVQxwF/uBjqEuSO+n4Kii/gfkG
 MJjlzqmqlO01O1fn2XHu
 =aHja
 -----END PGP SIGNATURE-----

Merge tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma

NFS: NFSoRDMA Client Side Changes

These patches include several bugfixes and cleanups for the NFSoRDMA client.
This includes bugfixes for NFS v4.1, proper RDMA_ERROR handling, and fixes
from the recent workqueue swicchover.  These patches also switch xprtrdma to
use the new CQ API

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

* tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (787 commits)
  xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs
  xprtrdma: Use an anonymous union in struct rpcrdma_mw
  xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs
  xprtrdma: Serialize credit accounting again
  xprtrdma: Properly handle RDMA_ERROR replies
  rpcrdma: Add RPCRDMA_HDRLEN_ERR
  xprtrdma: Do not wait if ib_post_send() fails
  xprtrdma: Segment head and tail XDR buffers on page boundaries
  xprtrdma: Clean up dprintk format string containing a newline
  xprtrdma: Clean up physical_op_map()
  xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro
2016-03-16 16:25:09 -04:00
Chuck Lever 59aa1f9a3c xprtrdma: Properly handle RDMA_ERROR replies
These are shorter than RPCRDMA_HDRLEN_MIN, and they need to
complete the waiting RPC.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:55:59 -04:00
Chuck Lever d8647f2d3c rpcrdma: Add RPCRDMA_HDRLEN_ERR
Error headers are shorter than either RDMA_MSG or RDMA_NOMSG.

Since HDRLEN_MIN is already used in several other places that would
be annoying to change, add RPCRDMA_HDRLEN_ERR for the one or two
spots where the shorter length is needed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:55:57 -04:00
Chuck Lever 66cede5279 xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro
Fixes: b3221d6a53 ('xprtrdma: Remove logic that constructs...')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14 14:55:44 -04:00
Chuck Lever be99bb1140 svcrdma: Use new CQ API for RPC-over-RDMA server send CQs
Calling ib_poll_cq() to sort through WCs during a completion is a
common pattern amongst RDMA consumers. Since commit 14d3a3b249
("IB: add a proper completion queue abstraction"), WC sorting can
be handled by the IB core.

By converting to this new API, svcrdma is made a better neighbor to
other RDMA consumers, as it allows the core to schedule the delivery
of completions more fairly amongst all active consumers.

This new API also aims each completion at a function that is
specific to the WR's opcode. Thus the ctxt->wr_op field and the
switch in process_context is replaced by a set of methods that
handle each completion type.

Because each ib_cqe carries a pointer to a completion method, the
core can now post operations on a consumer's QP, and handle the
completions itself.

The server's rdma_stat_sq_poll and rdma_stat_sq_prod metrics are no
longer updated.

As a clean up, the cq_event_handler, the dto_tasklet, and all
associated locking is removed, as they are no longer referenced or
used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01 13:06:43 -08:00
Chuck Lever 8bd5ba86d9 svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs
Calling ib_poll_cq() to sort through WCs during a completion is a
common pattern amongst RDMA consumers. Since commit 14d3a3b249
("IB: add a proper completion queue abstraction"), WC sorting can
be handled by the IB core.

By converting to this new API, svcrdma is made a better neighbor to
other RDMA consumers, as it allows the core to schedule the delivery
of completions more fairly amongst all active consumers.

Because each ib_cqe carries a pointer to a completion method, the
core can now post operations on a consumer's QP, and handle the
completions itself.

svcrdma receive completions no longer use the dto_tasklet. Each
polled Receive WC is now handled individually in soft IRQ context.

The server transport's rdma_stat_rq_poll and rdma_stat_rq_prod
metrics are no longer updated.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01 13:06:42 -08:00
Chuck Lever f3ea53fb3b svcrdma: Use correct XID in error replies
When constructing an error reply, svc_rdma_xdr_encode_error()
needs to view the client's request message so it can get the
failing request's XID.

svc_rdma_xdr_decode_req() is supposed to return a pointer to the
client's request header. But if it fails to decode the client's
message (and thus an error reply is needed) it does not return the
pointer. The server then sends a bogus XID in the error reply.

Instead, unconditionally generate the pointer to the client's header
in svc_rdma_recvfrom(), and pass that pointer to both functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Tested-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01 13:06:39 -08:00
Chuck Lever a6081b82c5 svcrdma: Make RDMA_ERROR messages work
Fix several issues with svc_rdma_send_error():

 - Post a receive buffer to replace the one that was consumed by
   the incoming request
 - Posting a send should use DMA_TO_DEVICE, not DMA_FROM_DEVICE
 - No need to put_page _and_ free pages in svc_rdma_put_context
 - Make sure the sge is set up completely in case the error
   path goes through svc_rdma_unmap_dma()
 - Replace the use of ENOSYS, which has a reserved meaning

Related fixes in svc_rdma_recvfrom():

 - Don't leak the ctxt associated with the incoming request
 - Don't close the connection after sending an error reply
 - Let svc_rdma_send_error() figure out the right header error code

As a last clean up, move svc_rdma_send_error() to svc_rdma_sendto.c
with other similar functions. There is some common logic in these
functions that could someday be combined to reduce code duplication.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Tested-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01 13:06:38 -08:00
Chuck Lever c6db03ea57 rpcrdma: Add RPCRDMA_HDRLEN_ERR
Error headers are shorter than either RDMA_MSG or RDMA_NOMSG.

Since HDRLEN_MIN is already used in several other places that would
be annoying to change, add RPCRDMA_HDRLEN_ERR for the one or two
spots where the shorter length is needed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01 13:06:38 -08:00
Chuck Lever bf36387ad3 svcrdma: svc_rdma_post_recv() should close connection on error
Clean up: Most svc_rdma_post_recv() call sites close the transport
connection when a receive cannot be posted. Wrap that in a common
helper.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Tested-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01 13:06:37 -08:00
Chuck Lever 4500632f60 nfsd: Lower NFSv4.1 callback message size limit
The maximum size of a backchannel message on RPC-over-RDMA depends
on the connection's inline threshold. Today that threshold is
typically 1024 bytes, making the maximum message size 996 bytes.

The Linux server's CREATE_SESSION operation checks that the size
of callback Calls can be as large as 1044 bytes, to accommodate
RPCSEC_GSS. Thus CREATE_SESSION fails if a client advertises the
true message size maximum of 996 bytes.

But the server's backchannel currently does not support RPCSEC_GSS.
The actual maximum size it needs is much smaller. It is safe to
reduce the limit to enable NFSv4.1 on RDMA backchannel operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01 13:06:35 -08:00
Chuck Lever f6763c29ab svcrdma: Do not send Write chunk XDR pad with inline content
The NFS server's XDR encoders adds an XDR pad for content in the
xdr_buf page list at the beginning of the xdr_buf's tail buffer.

On RDMA transports, Write chunks are sent separately and without an
XDR pad.

If a Write chunk is being sent, strip off the pad in the tail buffer
so that inline content following the Write chunk remains XDR-aligned
when it is sent to the client.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01 13:06:34 -08:00
Trond Myklebust 7f55489058 SUNRPC: Allow addition of new transports to a struct rpc_clnt
Add a function to allow creation and addition of a new transport
to an existing rpc_clnt

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-05 18:48:57 -05:00
Trond Myklebust 3227886c65 SUNRPC: Add a helper to apply a function to all the rpc_clnt's transports
Add a helper for tasks that require us to apply a function to all the
transports in an rpc_clnt.
An example of a usecase would be BIND_CONN_TO_SESSION, where we want
to send one RPC call down each transport.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-05 18:48:55 -05:00
Trond Myklebust 9d61498d5f SUNRPC: Allow caller to specify the transport to use
This is needed in order to allow the NFSv4.1 backchannel and
BIND_CONN_TO_SESSION function to work.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-05 18:48:55 -05:00
Trond Myklebust fb43d17210 SUNRPC: Use the multipath iterator to assign a transport to each task
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-05 18:48:55 -05:00
Trond Myklebust ad01b2c68d SUNRPC: Make rpc_clnt store the multipath iterators
This is a pre-patch for the RPC multipath code. It sets up the storage in
struct rpc_clnt for the multipath code.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-05 18:48:54 -05:00
Trond Myklebust 80b14d5e61 SUNRPC: Add a structure to track multiple transports
In order to support multipathing/trunking we will need the ability to
track multiple transports. This patch sets up a basic structure for
doing so.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-05 18:48:54 -05:00
Trond Myklebust fda1bfef9e SUNRPC: Make freeing of struct xprt rcu-safe
Have it call kfree_rcu() to ensure that we can use it on rcu-protected
lists.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-31 21:37:23 -05:00
Trond Myklebust 30c5116b11 SUNRPC: Uninline xprt_get(); It isn't performance critical.
Also allow callers to pass NULL arguments to xprt_get() and xprt_put().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-31 21:37:22 -05:00
Trond Myklebust 5edd1051f4 SUNRPC: Reorder rpc_task to put waitqueue related info in same cachelines
Try to group all the data required by the waitqueues, their timers and timer
callbacks into the same cachelines for performance. With this reordering,
"pahole" reports the following structure on x86_64:

struct rpc_task {
        atomic_t                   tk_count;             /*     0     4 */
        int                        tk_status;            /*     4     4 */
        struct list_head           tk_task;              /*     8    16 */
        void                       (*tk_callback)(struct rpc_task *); /*    24
        void                       (*tk_action)(struct rpc_task *); /*    32
        long unsigned int          tk_timeout;           /*    40     8 */
        long unsigned int          tk_runstate;          /*    48     8 */
        struct rpc_wait_queue *    tk_waitqueue;         /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        union {
                struct work_struct tk_work;              /*          64 */
                struct rpc_wait    tk_wait;              /*          56 */
        } u;                                             /*    64    64 */
        /* --- cacheline 2 boundary (128 bytes) --- */
        struct rpc_message         tk_msg;               /*   128    32 */
        void *                     tk_calldata;          /*   160     8 */
        const struct rpc_call_ops  * tk_ops;             /*   168     8 */
        struct rpc_clnt *          tk_client;            /*   176     8 */
        struct rpc_rqst *          tk_rqstp;             /*   184     8 */
        /* --- cacheline 3 boundary (192 bytes) --- */
        struct workqueue_struct *  tk_workqueue;         /*   192     8 */
        ktime_t                    tk_start;             /*   200     8 */
        pid_t                      tk_owner;             /*   208     4 */
        short unsigned int         tk_flags;             /*   212     2 */
        short unsigned int         tk_timeouts;          /*   214     2 */
        short unsigned int         tk_pid;               /*   216     2 */
        unsigned char              tk_priority:2;        /*   218: 6  1 */
        unsigned char              tk_garb_retry:2;      /*   218: 4  1 */
        unsigned char              tk_cred_retry:2;      /*   218: 2  1 */
        unsigned char              tk_rebind_retry:2;    /*   218: 0  1 */

        /* size: 224, cachelines: 4, members: 24 */
        /* padding: 5 */
        /* last cacheline: 32 bytes */
};

whereas on i386, it reports everything fitting into the 1st cacheline.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-31 21:37:22 -05:00
Trond Myklebust 58f1369216 SUNRPC: Remove unused function rpc_task_reset_client
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-31 21:37:22 -05:00
Herbert Xu 3b5cf20cf4 sunrpc: Use skcipher and ahash/shash
This patch replaces uses of blkcipher with skcipher and the long
obsolete hash interface with either shash (for non-SG users) and
ahash.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-01-27 20:36:01 +08:00
Linus Torvalds 048ccca8c1 Initial roundup of 4.5 merge window patches
- Remove usage of ib_query_device and instead store attributes in
   ib_device struct
 - Move iopoll out of block and into lib, rename to irqpoll, and use
   in several places in the rdma stack as our new completion queue
   polling library mechanism.  Update the other block drivers that
   already used iopoll to use the new mechanism too.
 - Replace the per-entry GID table locks with a single GID table lock
 - IPoIB multicast cleanup
 - Cleanups to the IB MR facility
 - Add support for 64bit extended IB counters
 - Fix for netlink oops while parsing RDMA nl messages
 - RoCEv2 support for the core IB code
 - mlx4 RoCEv2 support
 - mlx5 RoCEv2 support
 - Cross Channel support for mlx5
 - Timestamp support for mlx5
 - Atomic support for mlx5
 - Raw QP support for mlx5
 - MAINTAINERS update for mlx4/mlx5
 - Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates
 - Add support for remote invalidate to the iSER driver (pushed through the
   RDMA tree due to dependencies, acknowledged by nab)
 - Update to NFSoRDMA (pushed through the RDMA tree due to dependencies,
   acknowledged by Bruce)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWoSygAAoJELgmozMOVy/dDjsP/2vbTda2MvQfkfkGEZBQdJSg
 095RN0gQgCJdg78lAl8yuaK8r4VN/7uefpDtFdudH1I/Pei7X0wxN9R1UzFNG4KR
 AD53lz92IVPs15328SbPR2kvNWISR9aBFQo3rlElq3Grqlp0EMn2Ou1vtu87rekF
 aMllxr8Nl0uZhP+eWusOsYpJUUtwirLgRnrAyfqo2UxZh/TMIroT0TCx1KXjVcAg
 dhDARiZAdu3OgSc6OsWqmH+DELEq6dFVA5F+DDBGAb8bFZqlJc7cuMHWInwNsNXT
 so4bnEQ835alTbsdYtqs5DUNS8heJTAJP4Uz0ehkTh/uNCcvnKeUTw1c2P/lXI1k
 7s33gMM+0FXj0swMBw0kKwAF2d9Hhus9UAN7NwjBuOyHcjGRd5q7SAnfWkvKx000
 s9jVW19slb2I38gB58nhjOh8s+vXUArgxnV1+kTia1+bJSR5swvVoWRicRXdF0vh
 TvLX/BjbSIU73g1TnnLNYoBTV3ybFKQ6bVdQW7fzSTDs54dsI1vvdHXi3bYZCpnL
 HVwQTZRfEzkvb0AdKbcvf8p/TlaAHem3ODqtO1eHvO4if1QJBSn+SptTEeJVYYdK
 n4B3l/dMoBH4JXJUmEHB9jwAvYOpv/YLAFIvdL7NFwbqGNsC3nfXFcmkVORB1W3B
 KEMcM2we4bz+uyKMjEAD
 =5oO7
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "Initial roundup of 4.5 merge window patches

   - Remove usage of ib_query_device and instead store attributes in
     ib_device struct

   - Move iopoll out of block and into lib, rename to irqpoll, and use
     in several places in the rdma stack as our new completion queue
     polling library mechanism.  Update the other block drivers that
     already used iopoll to use the new mechanism too.

   - Replace the per-entry GID table locks with a single GID table lock

   - IPoIB multicast cleanup

   - Cleanups to the IB MR facility

   - Add support for 64bit extended IB counters

   - Fix for netlink oops while parsing RDMA nl messages

   - RoCEv2 support for the core IB code

   - mlx4 RoCEv2 support

   - mlx5 RoCEv2 support

   - Cross Channel support for mlx5

   - Timestamp support for mlx5

   - Atomic support for mlx5

   - Raw QP support for mlx5

   - MAINTAINERS update for mlx4/mlx5

   - Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates

   - Add support for remote invalidate to the iSER driver (pushed
     through the RDMA tree due to dependencies, acknowledged by nab)

   - Update to NFSoRDMA (pushed through the RDMA tree due to
     dependencies, acknowledged by Bruce)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
  IB/mlx5: Unify CQ create flags check
  IB/mlx5: Expose Raw Packet QP to user space consumers
  {IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
  IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
  IB/mlx5: Add Raw Packet QP query functionality
  IB/mlx5: Add create and destroy functionality for Raw Packet QP
  IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
  IB/mlx5: Allocate a Transport Domain for each ucontext
  net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
  net/mlx5_core: Add RQ and SQ event handling
  net/mlx5_core: Export transport objects
  IB/mlx5: Expose CQE version to user-space
  IB/mlx5: Add CQE version 1 support to user QPs and SRQs
  IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
  IB/sa: Fix netlink local service GFP crash
  IB/srpt: Remove redundant wc array
  IB/qib: Improve ipoib UD performance
  IB/mlx4: Advertise RoCE v2 support
  IB/mlx4: Create and use another QP1 for RoCEv2
  IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
  ...
2016-01-23 18:45:06 -08:00
Christoph Hellwig 5fe1043da8 svc_rdma: use local_dma_lkey
We now alwasy have a per-PD local_dma_lkey available.  Make use of that
fact in svc_rdma and stop registering our own MR.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:30:48 -05:00
Chuck Lever 5d252f90a8 svcrdma: Add class for RDMA backwards direction transport
To support the server-side of an NFSv4.1 backchannel on RDMA
connections, add a transport class that enables backward
direction messages on an existing forward channel connection.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:30:48 -05:00
Chuck Lever 03fe993153 svcrdma: Define maximum number of backchannel requests
Extra resources for handling backchannel requests have to be
pre-allocated when a transport instance is created. Set up
additional fields in svcxprt_rdma to track these resources.

The max_requests fields are elements of the RPC-over-RDMA
protocol, so they should be u32. To ensure that unsigned
arithmetic is used everywhere, some other fields in the
svcxprt_rdma struct are updated.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:30:48 -05:00
Chuck Lever ba986c96f9 svcrdma: Make map_xdr non-static
Pre-requisite to use map_xdr in the backchannel code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:30:48 -05:00
Chuck Lever 39b09a1a12 svcrdma: Add gfp flags to svc_rdma_post_recv()
svc_rdma_post_recv() allocates pages for receive buffers on-demand.
It uses GFP_KERNEL so the allocator tries hard, and may sleep. But
I'm about to add a call to svc_rdma_post_recv() from a function
that may not sleep.

Since all svc_rdma_post_recv() call sites can tolerate its failure,
allow it to fail if the page allocator returns nothing. Longer term,
receive buffers, being a finite resource per-connection, should be
pre-allocated and re-used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:30:48 -05:00
Chuck Lever 71810ef327 svcrdma: Remove unused req_map and ctxt kmem_caches
Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:30:48 -05:00
Chuck Lever 2fe81b239d svcrdma: Improve allocation of struct svc_rdma_req_map
To ensure this allocation cannot fail and will not sleep,
pre-allocate the req_map structures per-connection.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:30:48 -05:00
Chuck Lever cc886c9ff1 svcrdma: Improve allocation of struct svc_rdma_op_ctxt
When the maximum payload size of NFS READ and WRITE was increased
by commit cc9a903d91 ("svcrdma: Change maximum server payload back
to RPCSVC_MAXPAYLOAD"), the size of struct svc_rdma_op_ctxt
increased to over 6KB (on x86_64). That makes allocating one of
these from a kmem_cache more likely to fail in situations when
system memory is exhausted.

Since I'm about to add a caller where this allocation must always
work _and_ it cannot sleep, pre-allocate ctxts for each connection.

Another motivation for this change is that NFSv4.x servers are
required by specification not to drop NFS requests. Pre-allocating
memory resources reduces the likelihood of a drop.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:30:48 -05:00
Scott Mayhew c3d4879e01 sunrpc: Add a function to close temporary transports immediately
Add a function svc_age_temp_xprts_now() to close temporary transports
whose xpt_local matches the address passed in server_addr immediately
instead of waiting for them to be closed by the timer function.

The function is intended to be used by notifier_blocks that will be
added to nfsd and lockd that will run when an ip address is deleted.

This will eliminate the ACK storms and client hangs that occur in
HA-NFS configurations where nfsd & lockd is left running on the cluster
nodes all the time and the NFS 'service' is migrated back and forth
within a short timeframe.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-12-23 10:08:15 -05:00
J. Bruce Fields 414ca017a5 nfsd4: fix gss-proxy 4.1 mounts for some AD principals
The principal name on a gss cred is used to setup the NFSv4.0 callback,
which has to have a client principal name to authenticate to.

That code wants the name to be in the form servicetype@hostname.
rpc.svcgssd passes down such names (and passes down no principal name at
all in the case the principal isn't a service principal).

gss-proxy always passes down the principal name, and passes it down in
the form servicetype/hostname@REALM.  So we've been munging the name
gss-proxy passes down into the format the NFSv4.0 callback code expects,
or throwing away the name if we can't.

Since the introduction of the MACH_CRED enforcement in NFSv4.1, we've
also been using the principal name to verify that certain operations are
done as the same principal as was used on the original EXCHANGE_ID call.

For that application, the original name passed down by gss-proxy is also
useful.

Lack of that name in some cases was causing some kerberized NFSv4.1
mount failures in an Active Directory environment.

This fix only works in the gss-proxy case.  The fix for legacy
rpc.svcgssd would be more involved, and rpc.svcgssd already has other
problems in the AD case.

Reported-and-tested-by: James Ralston <ralston@pobox.com>
Acked-by: Simo Sorce <simo@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-24 11:36:31 -07:00