Commit Graph

3314 Commits

Author SHA1 Message Date
Trond Myklebust 4f54614975 SUNRPC: Clean up xs_udp_data_receive()
Simplify the retry logic.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust 550aebfe1c SUNRPC: Allow AF_LOCAL sockets to use the generic stream receive
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust c50b8ee02f SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive()
In preparation for sharing with AF_LOCAL.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust 277e4ab7d5 SUNRPC: Simplify TCP receive code by switching to using iterators
Most of this code should also be reusable with other socket types.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust 9d96acbc7f SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter()
Add a bvec array to struct xdr_buf, and have the client allocate it
when we need to receive data into pages.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust 431f6eb357 SUNRPC: Add a label for RPC calls that require allocation on receive
If the RPC call relies on the receive call allocating pages as buffers,
then let's label it so that we
a) Don't leak memory by allocating pages for requests that do not expect
   this behaviour
b) Can optimise for the common case where calls do not require allocation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust 79c99152a3 SUNRPC: Convert the xprt->sending queue back to an ordinary wait queue
We no longer need priority semantics on the xprt->sending queue, because
the order in which tasks are sent is now dictated by their position in
the send queue.
Note that the backlog queue remains a priority queue, meaning that
slot resources are still managed in order of task priority.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust f42f7c2830 SUNRPC: Fix priority queue fairness
Fix up the priority queue to not batch by owner, but by queue, so that
we allow '1 << priority' elements to be dequeued before switching to
the next priority queue.
The owner field is still used to wake up requests in round robin order
by owner to avoid single processes hogging the RPC layer by loading the
queues.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust 95f7691daa SUNRPC: Convert xprt receive queue to use an rbtree
If the server is slow, we can find ourselves with quite a lot of entries
on the receive queue. Converting the search from an O(n) to O(log(n))
can make a significant difference, particularly since we have to hold
a number of locks while searching.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust bd79bc579c SUNRPC: Don't take transport->lock unnecessarily when taking XPRT_LOCK
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust adfa71446d SUNRPC: Cleanup: remove the unused 'task' argument from the request_send()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust c544577dad SUNRPC: Clean up transport write space handling
Treat socket write space handling in the same way we now treat transport
congestion: by denying the XPRT_LOCK until the transport signals that it
has free buffer space.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust 36bd7de949 SUNRPC: Turn off throttling of RPC slots for TCP sockets
The theory was that we would need to grab the socket lock anyway, so we
might as well use it to gate the allocation of RPC slots for a TCP
socket.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust f05d54ecf6 SUNRPC: Allow soft RPC calls to time out when waiting for the XPRT_LOCK
This no longer causes them to lose their place in the transmission queue.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust 89f90fe1ad SUNRPC: Allow calls to xprt_transmit() to drain the entire transmit queue
Rather than forcing each and every RPC task to grab the socket write
lock in order to send itself, we allow whichever task is holding the
write lock to attempt to drain the entire transmit queue.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust 86aeee0eb6 SUNRPC: Enqueue swapper tagged RPCs at the head of the transmit queue
Avoid memory starvation by giving RPCs that are tagged with the
RPC_TASK_SWAPPER flag the highest priority.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust 75891f502f SUNRPC: Support for congestion control when queuing is enabled
Both RDMA and UDP transports require the request to get a "congestion control"
credit before they can be transmitted. Right now, this is done when
the request locks the socket. We'd like it to happen when a request attempts
to be transmitted for the first time.
In order to support retransmission of requests that already hold such
credits, we also want to ensure that they get queued first, so that we
don't deadlock with requests that have yet to obtain a credit.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust 918f3c1fe8 SUNRPC: Improve latency for interactive tasks
One of the intentions with the priority queues was to ensure that no
single process can hog the transport. The field task->tk_owner therefore
identifies the RPC call's origin, and is intended to allow the RPC layer
to organise queues for fairness.
This commit therefore modifies the transmit queue to group requests
by task->tk_owner, and ensures that we round robin among those groups.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust dcbbeda836 SUNRPC: Move RPC retransmission stat counter to xprt_transmit()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust 5f2f6bd987 SUNRPC: Simplify xprt_prepare_transmit()
Remove the checks for whether or not we need to transmit, and whether
or not a reply has been received. Those are already handled in
call_transmit() itself.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust 04b3b88fbf SUNRPC: Don't reset the request 'bytes_sent' counter when releasing XPRT_LOCK
If the request is still on the queue, this will be incorrect behaviour.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust 50f484e298 SUNRPC: Treat the task and request as separate in the xprt_ops->send_request()
When we shift to using the transmit queue, then the task that holds the
write lock will not necessarily be the same as the one being transmitted.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust 902c58872e SUNRPC: Fix up the back channel transmit
Fix up the back channel code to recognise that it has already been
transmitted, so does not need to be called again.
Also ensure that we set req->rq_task.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust 762e4e67b3 SUNRPC: Refactor RPC call encoding
Move the call encoding so that it occurs before the transport connection
etc.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust 944b042921 SUNRPC: Add a transmission queue for RPC requests
Add the queue that will enforce the ordering of RPC task transmission.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust ef3f54347f SUNRPC: Distinguish between the slot allocation list and receive queue
When storing a struct rpc_rqst on the slot allocation list, we currently
use the same field 'rq_list' as we use to store the request on the
receive queue. Since the structure is never on both lists at the same
time, this is OK.
However, for clarity, let's make that a union with different names for
the different lists so that we can more easily distinguish between
the two states.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust 78b576ced2 SUNRPC: Minor cleanup for call_transmit()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust 7f3a1d1e18 SUNRPC: Refactor xprt_transmit() to remove wait for reply code
Allow the caller in clnt.c to call into the code to wait for a reply
after calling xprt_transmit(). Again, the reason is that the backchannel
code does not need this functionality.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust edc81dcd5b SUNRPC: Refactor xprt_transmit() to remove the reply queue code
Separate out the action of adding a request to the reply queue so that the
backchannel code can simply skip calling it altogether.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust 75c84151a9 SUNRPC: Rename xprt->recv_lock to xprt->queue_lock
We will use the same lock to protect both the transmit and receive queues.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust ec37a58fba SUNRPC: Don't wake queued RPC calls multiple times in xprt_transmit
Rather than waking up the entire queue of RPC messages a second time,
just wake up the task that was put to sleep.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust 5ce970393b SUNRPC: Test whether the task is queued before grabbing the queue spinlocks
When asked to wake up an RPC task, it makes sense to test whether or not
the task is still queued.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust 359c48c04a SUNRPC: Add a helper to wake up a sleeping rpc_task and set its status
Add a helper that will wake up a task that is sleeping on a specific
queue, and will set the value of task->tk_status. This is mainly
intended for use by the transport layer to notify the task of an
error condition.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust cf9946cd61 SUNRPC: Refactor the transport request pinning
We are going to need to pin for both send and receive.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust 4cd34e7c2e SUNRPC: Simplify dealing with aborted partially transmitted messages
If the previous message was only partially transmitted, we need to close
the socket in order to avoid corruption of the message stream. To do so,
we currently hijack the unlocking of the socket in order to schedule
the close.
Now that we track the message offset in the socket state, we can move
that kind of checking out of the socket lock code, which is needed to
allow messages to remain queued after dropping the socket lock.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust 6c7a64e5a4 SUNRPC: Add socket transmit queue offset tracking
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust e1806c7bfb SUNRPC: Move reset of TCP state variables into the reconnect code
Rather than resetting state variables in socket state_change() callback,
do it in the sunrpc TCP connect function itself.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust d1109aa56c SUNRPC: Rename TCP receive-specific state variables
Since we will want to introduce similar TCP state variables for the
transmission of requests, let's rename the existing ones to label
that they are for the receive side.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust 3a03818fbe SUNRPC: Avoid holding locks across the XDR encoding of the RPC message
Currently, we grab the socket bit lock before we allow the message
to be XDR encoded. That significantly slows down the transmission
rate, since we serialise on a potentially blocking operation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust 7ebbbc6e7b SUNRPC: Simplify identification of when the message send/receive is complete
Add states to indicate that the message send and receive are not yet
complete.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust 3021a5bbbf SUNRPC: The transmitted message must lie in the RPCSEC window of validity
If a message has been encoded using RPCSEC_GSS, the server is
maintaining a window of sequence numbers that it considers valid.
The client should normally be tracking that window, and needs to
verify that the sequence number used by the message being transmitted
still lies inside the window of validity.

So far, we've been able to assume this condition would be realised
automatically, since the client has been encoding the message only
after taking the socket lock. Once we change that condition, we
will need the explicit check.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:13 -04:00
Trond Myklebust 9ee94d3ed6 SUNRPC: If there is no reply expected, bail early from call_decode
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:13 -04:00
Trond Myklebust 9dc6edcf67 SUNRPC: Clean up initialisation of the struct rpc_rqst
Move the initialisation back into xprt.c.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:13 -04:00
Linus Torvalds 53a01c9a5f NFS client updates for Linux 4.19
Stable bufixes:
 - v3.17+: Fix an off-by-one in bl_map_stripe()
 - v4.9+: NFSv4 client live hangs after live data migration recovery
 - v4.18+: xprtrdma: Fix disconnect regression
 - v4.14+: Fix locking in pnfs_generic_recover_commit_reqs
 - v4.9+: Fix a sleep in atomic context in nfs4_callback_sequence()
 
 Features:
 - Add support for asynchronous server-side COPY operations
 
 Other bugfixes and cleanups:
 - Optitmizations and fixes involving NFS v4.1 / pNFS layout handling
 - Optimize lseek(fd, SEEK_CUR, 0) on directories to avoid locking
 - Immediately reschedule writeback when the server replies with an error
 - Fix excessive attribute revalidation in nfs_execute_ok()
 - Add error checking to nfs_idmap_prepare_message()
 - Use new vm_fault_t return type
 - Return a delegation when reclaiming one that the server has recalled
 - Referrals should inherit proto setting from parents
 - Make rpc_auth_create_args a const
 - Improvements to rpc_iostats tracking
 - Fix a potential reference leak when there is an error processing a callback
 - Fix rmdir / mkdir / rename nlink accounting
 - Fix updating inode change attribute
 - Fix error handling in nfsn4_sp4_select_mode()
 - Use an appropriate work queue for direct-write completion
 - Don't busy wait if NFSv4 session draining is interrupted
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlt/CYIACgkQ18tUv7Cl
 QOu8gBAA0xQWmgRoG6oIdYUxvgYqhuJmMqC4SU1E6mCJ93xEuUSvEFw51X+84KCt
 r6UPkp/bKiVe3EIinKTplIzuxgggXNG0EQmO46FYNTl7nqpN85ffLsQoWsiD23fp
 j8afqKPFR2zfhHXLKQC7k1oiOpwGqJ+EJWgIW4llE80pSNaErEoEaDqSPds5thMN
 dHEjjLr8ef6cbBux6sSPjwWGNbE82uoSu3MDuV2+e62hpGkgvuEYo1vyE6ujeZW5
 MUsmw+AHZkwro0msTtNBOHcPZAS0q/2UMPzl1tsDeCWNl2mugqZ6szQLSS2AThKq
 Zr6iK9Q5dWjJfrQHcjRMnYJB+SCX1SfPA7ASuU34opwcWPjecbS9Q92BNTByQYwN
 o9ngs2K0mZfqpYESMAmf7Il134cCBrtEp3skGko2KopJcYcE5YUFhdKihi1yQQjU
 UbOOubMpQk8vY9DpDCAwGbICKwUZwGvq27uuUWL20kFVDb1+jvfHwcV4KjRAJo/E
 J9aFtU+qOh4rMPMnYlEVZcAZBGfenlv/DmBl1upRpjzBkteUpUJsAbCmGyAk4616
 3RECasehgsjNCQpFIhv3FpUkWzP5jt0T3gRr1NeY6WKJZwYnHEJr9PtapS+EIsCT
 tB5DvvaJqFtuHFOxzn+KlGaxdSodHF7klOq7NM3AC0cX8AkWqaU=
 =8+9t
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "These patches include adding async support for the v4.2 COPY
  operation. I think Bruce is planning to send the server patches for
  the next release, but I figured we could get the client side out of
  the way now since it's been in my tree for a while. This shouldn't
  cause any problems, since the server will still respond with
  synchronous copies even if the client requests async.

  Features:
   - Add support for asynchronous server-side COPY operations

  Stable bufixes:
   - Fix an off-by-one in bl_map_stripe() (v3.17+)
   - NFSv4 client live hangs after live data migration recovery (v4.9+)
   - xprtrdma: Fix disconnect regression (v4.18+)
   - Fix locking in pnfs_generic_recover_commit_reqs (v4.14+)
   - Fix a sleep in atomic context in nfs4_callback_sequence() (v4.9+)

  Other bugfixes and cleanups:
   - Optimizations and fixes involving NFS v4.1 / pNFS layout handling
   - Optimize lseek(fd, SEEK_CUR, 0) on directories to avoid locking
   - Immediately reschedule writeback when the server replies with an
     error
   - Fix excessive attribute revalidation in nfs_execute_ok()
   - Add error checking to nfs_idmap_prepare_message()
   - Use new vm_fault_t return type
   - Return a delegation when reclaiming one that the server has
     recalled
   - Referrals should inherit proto setting from parents
   - Make rpc_auth_create_args a const
   - Improvements to rpc_iostats tracking
   - Fix a potential reference leak when there is an error processing a
     callback
   - Fix rmdir / mkdir / rename nlink accounting
   - Fix updating inode change attribute
   - Fix error handling in nfsn4_sp4_select_mode()
   - Use an appropriate work queue for direct-write completion
   - Don't busy wait if NFSv4 session draining is interrupted"

* tag 'nfs-for-4.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (54 commits)
  pNFS: Remove unwanted optimisation of layoutget
  pNFS/flexfiles: ff_layout_pg_init_read should exit on error
  pNFS: Treat RECALLCONFLICT like DELAY...
  pNFS: When updating the stateid in layoutreturn, also update the recall range
  NFSv4: Fix a sleep in atomic context in nfs4_callback_sequence()
  NFSv4: Fix locking in pnfs_generic_recover_commit_reqs
  NFSv4: Fix a typo in nfs4_init_channel_attrs()
  NFSv4: Don't busy wait if NFSv4 session draining is interrupted
  NFS recover from destination server reboot for copies
  NFS add a simple sync nfs4_proc_commit after async COPY
  NFS handle COPY ERR_OFFLOAD_NO_REQS
  NFS send OFFLOAD_CANCEL when COPY killed
  NFS export nfs4_async_handle_error
  NFS handle COPY reply CB_OFFLOAD call race
  NFS add support for asynchronous COPY
  NFS COPY xdr handle async reply
  NFS OFFLOAD_CANCEL xdr
  NFS CB_OFFLOAD xdr
  NFS: Use an appropriate work queue for direct-write completion
  NFSv4: Fix error handling in nfs4_sp4_select_mode()
  ...
2018-08-23 16:03:58 -07:00
Linus Torvalds 9157141c95 A mistake on my part caused me to tag my branch 6 commits too early,
missing Chuck's fixes for the problem with callbacks over GSS from
 multi-homed servers, and a smaller fix from Laura Abbott.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbftA8AAoJECebzXlCjuG+QPMQALieEKkX0YoqRhPz5G+RrWFy
 KgOBFAoiRcjFQD6wMt9FzD6qYEZqSJ+I2b+K5N3BkdyDDQu845iD0wK0zBGhMgLm
 7ith85nphIMbe18+5jPorqAsI9RlfBQjiSGw1MEx5dicLQQzTObHL5q+l5jcWna4
 jWS3yUKv1URpOsR1hIryw74ktSnhuH8n//zmntw8aWrCkq3hnXOZK/agtYxZ7Viv
 V3kiQsiNpL2FPRcHN7ejhLUTnRkkuD2iYKrzP/SpTT/JfdNEUXlMhKkAySogNpus
 nvR9X7hwta8Lgrt7PSB9ibFTXtCupmuICg5mbDWy6nXea2NvpB01QhnTzrlX17Eh
 Yfk/18z95b6Qs1v4m3SI8ESmyc6l5dMZozLudtHzifyCqooWZriEhCR1PlQfQ/FJ
 4cYQ8U/qiMiZIJXL7N2wpSoSaWR5bqU1rXen29Np1WEDkiv4Nf5u2fsCXzv0ZH2C
 ReWpNkbnNxsNiKpp4geBZtlcSEU1pk+1PqE0MagTdBV3iptiUHRSP4jR7qLnc0zT
 J1lCvU7Fodnt9vNSxMpt2Jd6XxQ6xtx7n6aMQAiYFnXDs+hP2hPnJVCScnYW3L6R
 2r1sHRKKeoOzCJ2thw+zu4lOwMm7WPkJPWAYfv90reWkiKoy2vG0S9P7wsNGoJuW
 fuEjB2b9pow1Ffynat6q
 =JnLK
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.19-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Chuck Lever fixed a problem with NFSv4.0 callbacks over GSS from
  multi-homed servers.

  The only new feature is a minor bit of protocol (change_attr_type)
  which the client doesn't even use yet.

  Other than that, various bugfixes and cleanup"

* tag 'nfsd-4.19-1' of git://linux-nfs.org/~bfields/linux: (27 commits)
  sunrpc: Add comment defining gssd upcall API keywords
  nfsd: Remove callback_cred
  nfsd: Use correct credential for NFSv4.0 callback with GSS
  sunrpc: Extract target name into svc_cred
  sunrpc: Enable the kernel to specify the hostname part of service principals
  sunrpc: Don't use stack buffer with scatterlist
  rpc: remove unneeded variable 'ret' in rdma_listen_handler
  nfsd: use true and false for boolean values
  nfsd: constify write_op[]
  fs/nfsd: Delete invalid assignment statements in nfsd4_decode_exchange_id
  NFSD: Handle full-length symlinks
  NFSD: Refactor the generic write vector fill helper
  svcrdma: Clean up Read chunk path
  svcrdma: Avoid releasing a page in svc_xprt_release()
  nfsd: Mark expected switch fall-through
  sunrpc: remove redundant variables 'checksumlen','blocksize' and 'data'
  nfsd: fix leaked file lock with nfs exported overlayfs
  nfsd: don't advertise a SCSI layout for an unsupported request_queue
  nfsd: fix corrupted reply to badly ordered compound
  nfsd: clarify check_op_ordering
  ...
2018-08-23 16:00:10 -07:00
Chuck Lever 108b833cde sunrpc: Add comment defining gssd upcall API keywords
During review, it was found that the target, service, and srchost
keywords are easily conflated. Add an explainer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-22 18:32:07 -04:00
Chuck Lever 9abdda5dda sunrpc: Extract target name into svc_cred
NFSv4.0 callback needs to know the GSS target name the client used
when it established its lease. That information is available from
the GSS context created by gssproxy. Make it available in each
svc_cred.

Note this will also give us access to the real target service
principal name (which is typically "nfs", but spec does not require
that).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-22 18:32:07 -04:00
Chuck Lever a1a237775e sunrpc: Enable the kernel to specify the hostname part of service principals
A multi-homed NFS server may have more than one "nfs" key in its
keytab. Enable the kernel to pick the key it wants as a machine
credential when establishing a GSS context.

This is useful for GSS-protected NFSv4.0 callbacks, which are
required by RFC 7530 S3.3.3 to use the same principal as the service
principal the client used when establishing its lease.

A complementary modification to rpc.gssd is required to fully enable
this feature.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-22 18:32:07 -04:00
Laura Abbott 44090cc876 sunrpc: Don't use stack buffer with scatterlist
Fedora got a bug report from NFS:

kernel BUG at include/linux/scatterlist.h:143!
...
RIP: 0010:sg_init_one+0x7d/0x90
..
  make_checksum+0x4e7/0x760 [rpcsec_gss_krb5]
  gss_get_mic_kerberos+0x26e/0x310 [rpcsec_gss_krb5]
  gss_marshal+0x126/0x1a0 [auth_rpcgss]
  ? __local_bh_enable_ip+0x80/0xe0
  ? call_transmit_status+0x1d0/0x1d0 [sunrpc]
  call_transmit+0x137/0x230 [sunrpc]
  __rpc_execute+0x9b/0x490 [sunrpc]
  rpc_run_task+0x119/0x150 [sunrpc]
  nfs4_run_exchange_id+0x1bd/0x250 [nfsv4]
  _nfs4_proc_exchange_id+0x2d/0x490 [nfsv4]
  nfs41_discover_server_trunking+0x1c/0xa0 [nfsv4]
  nfs4_discover_server_trunking+0x80/0x270 [nfsv4]
  nfs4_init_client+0x16e/0x240 [nfsv4]
  ? nfs_get_client+0x4c9/0x5d0 [nfs]
  ? _raw_spin_unlock+0x24/0x30
  ? nfs_get_client+0x4c9/0x5d0 [nfs]
  nfs4_set_client+0xb2/0x100 [nfsv4]
  nfs4_create_server+0xff/0x290 [nfsv4]
  nfs4_remote_mount+0x28/0x50 [nfsv4]
  mount_fs+0x3b/0x16a
  vfs_kern_mount.part.35+0x54/0x160
  nfs_do_root_mount+0x7f/0xc0 [nfsv4]
  nfs4_try_mount+0x43/0x70 [nfsv4]
  ? get_nfs_version+0x21/0x80 [nfs]
  nfs_fs_mount+0x789/0xbf0 [nfs]
  ? pcpu_alloc+0x6ca/0x7e0
  ? nfs_clone_super+0x70/0x70 [nfs]
  ? nfs_parse_mount_options+0xb40/0xb40 [nfs]
  mount_fs+0x3b/0x16a
  vfs_kern_mount.part.35+0x54/0x160
  do_mount+0x1fd/0xd50
  ksys_mount+0xba/0xd0
  __x64_sys_mount+0x21/0x30
  do_syscall_64+0x60/0x1f0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is BUG_ON(!virt_addr_valid(buf)) triggered by using a stack
allocated buffer with a scatterlist. Convert the buffer for
rc4salt to be dynamically allocated instead.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1615258
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-22 18:32:07 -04:00
Jason Gunthorpe 0a3173a5f0 Merge branch 'linus/master' into rdma.git for-next
rdma.git merge resolution for the 4.19 merge window

Conflicts:
 drivers/infiniband/core/rdma_core.c
   - Use the rdma code and revise with the new spelling for
     atomic_fetch_add_unless
 drivers/nvme/host/rdma.c
   - Replace max_sge with max_send_sge in new blk code
 drivers/nvme/target/rdma.c
   - Use the blk code and revise to use NULL for ib_post_recv when
     appropriate
   - Replace max_sge with max_recv_sge in new blk code
 net/rds/ib_send.c
   - Use the net code and revise to use NULL for ib_post_recv when
     appropriate

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-16 14:21:29 -06:00