Currently we fail to return an error if the NFSv3 module failed to load
when we're trying to connect to a pNFS data server.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
After we grab the lock in nfs4_pnfs_ds_connect(), there is no check for
whether or not ds->ds_clp has already been initialised, so we can end up
adding the same transports multiple times.
Fixes: fc821d5920 ("pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cache the layout in the arguments so we don't have to keep looking it up
from the inode.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the layout gets invalidated, we should wait for any outstanding
layoutget requests for that layout to complete, and we should resend
them only after re-establishing the layout stateid.
Fixes: d29b468da4 ("pNFS/NFSv4: Improve rejection of out-of-order layouts")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If we have multiple outstanding layoutget requests, the current code to
update the layout barrier assumes that the outstanding layout stateids
are updated in order. That's not necessarily the case.
Instead of using the value of lo->plh_outstanding as a guesstimate for
the window of values we need to accept, just wait to update the window
until we're processing the last one. The intention here is just to
ensure that we don't process 2^31 seqid updates without also updating
the barrier.
Fixes: 1bcf34fdac ("pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Earlier commits refactored some NFS read code and removed
nfs_readpage_async(), but neglected to properly fixup
nfs_readpage_from_fscache_complete(). The code path is
only hit when something unusual occurs with the cachefiles
backing filesystem, such as an IO error or while a cookie
is being invalidated.
Mark page with PG_checked if fscache IO completes in error,
unlock the page, and let the VM decide to re-issue based on
PG_uptodate. When the VM reissues the readpage, PG_checked
allows us to skip over fscache and read from the server.
Link: https://marc.info/?l=linux-nfs&m=162498209518739
Fixes: 1e83b173b2 ("NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
A previous refactoring of nfs_readpage() might end up calling
wait_on_page_locked_killable() even if readpage_async_filler() failed
with an internal error and pg_error was non-zero (for example, if
nfs_create_request() failed). In the case of an internal error,
skip over wait_on_page_locked_killable() as this is only needed
when the read is sent and an error occurs during completion handling.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Once a transport has been put offline, this transport can be also
removed from the list of transports. Any tasks that have been stuck
on this transport would find the next available active transport
and be re-tried. This transport would be removed from the xprt_switch
list and freed.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
In preparation of being able to change the xprt's state, add a way
to show currect state of the transport.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Once a task grabs a trasnport it's reflected in the queuelen of
the rpc_xprt structure. Add display of that value in the xprt's
info file in sysfs.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Allow to query xrpt_switch attributes. Currently showing the following
fields of the rpc_xprt_switch structure: xps_nxprts, xps_nactive,
xps_queuelen.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
In preparation for when we can re-try a task on a different transport,
identify and mark such RPC tasks as moveable. Only 4.1+ operarations can
be re-tried on a different transport.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Allow to query transport's attributes. Currently showing following
fields of the rpc_xprt structure: state, last_used, cong, cwnd,
max_reqs, min_reqs, num_reqs, sizes of queues binding, sending,
pending, backlog.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Using sysfs's xprt_state attribute, mark a particular transport offline.
It will not be picked during the round-robin selection. It's not allowed
to take the main (1st created transport associated with the rpc_client)
offline. Also bring a transport back online via sysfs by writing "online"
and that would allow for this transport to be picked during the round-
robin selection.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Allow to query and set the destination's address of a transport.
Setting of the destination address is allowed only for TCP or RDMA
based connections.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Using TCP connection's source port it is useful to match connections
seen on the network traces to the xprts used by the linux nfs client.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Provide ability to query transport's source port.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Display in sysfs in the information about the xprt if this is a
main transport or not.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When an RPC client gets created it's first transport is special
and should be marked a main transport.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Add individual transport directories under each transport switch
group. For instance, for each nconnect=X connections there will be
a transport directory. Naming conventions also identifies transport
type -- xprt-<id>-<type> where type is udp, tcp, rdma, local, bc.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
An rpc client uses a transport switch and one ore more transports
associated with that switch. Since transports are shared among
rpc clients, create a symlink into the xprt_switch directory
instead of duplicating entries under each rpc client.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Add xprt_switch directory to the sysfs and create individual
xprt_swith subdirectories for multipath transport group.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We need to keep track of the type for a given transport.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
This is used to uniquely identify sunrpc multipath objects in /sys.
Signed-off-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
This adds a unique identifier for a sunrpc transport in sysfs, which is
similarly managed to the unique IDs of clients.
Signed-off-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
These will eventually have files placed under them for sysfs operations.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
This is where we'll put per-rpc_client related files
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Instead of returning ENOLCK when we can't hand out a lease, we should be
returning EAGAIN.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If a file has already been closed, then it should not be selected to
support further I/O.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
[Trond: Fix an invalid pointer deref reported by Colin Ian King]
When find a task from wait queue to wake up, a non-privileged task may
be found out, rather than the privileged. This maybe lead a deadlock
same as commit dfe1fe75e0 ("NFSv4: Fix deadlock between nfs4_evict_inode()
and nfs4_opendata_get_inode()"):
Privileged delegreturn task is queued to privileged list because all
the slots are assigned. If there has no enough slot to wake up the
non-privileged batch tasks(session less than 8 slot), then the privileged
delegreturn task maybe lost waked up because the found out task can't
get slot since the session is on draining.
So we should treate the privileged task as the emergency task, and
execute it as for as we can.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 5fcdfacc01 ("NFSv4: Return delegations synchronously in evict_inode")
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The 'queue->nr' will wraparound from 0 to 255 when only current
priority queue has tasks. This maybe lead a deadlock same as commit
dfe1fe75e0 ("NFSv4: Fix deadlock between nfs4_evict_inode()
and nfs4_opendata_get_inode()"):
Privileged delegreturn task is queued to privileged list because all
the slots are assigned. When non-privileged task complete and release
the slot, a non-privileged maybe picked out. It maybe allocate slot
failed when the session on draining.
If the 'queue->nr' has wraparound to 255, and no enough slot to
service it, then the privileged delegreturn will lost to wake up.
So we should avoid the wraparound on 'queue->nr'.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 5fcdfacc01 ("NFSv4: Return delegations synchronously in evict_inode")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Simplify nfs_pageio_complete_read() by using the inode pointer saved
inside nfs_pageio_descriptor.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
After calling security_sb_clone_mnt_opts() in nfs_get_root(), it's
necessary to copy the value of has_sec_mnt_opts from the cloned
super_block's nfs_server. Otherwise, calls to nfs_compare_super()
using this super_block may not return the correct result, leading to
mount failures.
For example, mounting an nfs server with the following in /etc/exports:
/export *(rw,insecure,crossmnt,no_root_squash,security_label)
and having /export/scratch on a separate block device.
mount -o v4.2,context=system_u:object_r:root_t:s0 server:/export/test /mnt/test
mount -o v4.2,context=system_u:object_r:swapfile_t:s0 server:/export/scratch /mnt/scratch
The second mount would fail with "mount.nfs: /mnt/scratch is busy or
already mounted or sharecache fail" and "SELinux: mount invalid. Same
superblock, different security settings for..." would appear in the
syslog.
Also while we're in there, replace several instances of "NFS_SB(s)"
with "server", which was already declared at the top of the
nfs_get_root().
Fixes: ec1ade6a04 ("nfs: account for selinux security context when deciding to share superblock")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We know that the attributes changed on the server if and only if the
change attribute is different. Otherwise, we're just refreshing our
cache with values that were already known to be stale.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the change attribute update is declared to be non-atomic by the
server, or our cached value does not match the server's value before the
operation was performed, then we should declare the inode cache invalid.
On the other hand, if the change to the directory raced with a lookup or
getattr which already updated the change attribute, then optimise away
the revalidation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The inode is considered revalidated when we've checked the value of the
change attribute against our cached value since that suffices to
establish whether or not the other cached values are valid.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When looking into another nfs xfstests report, I found acl and
default_acl in nfs3_proc_create() and nfs3_proc_mknod() error
paths are possibly leaked. Fix them in advance.
Fixes: 013cdf1088 ("nfs: use generic posix ACL infrastructure for v3 Posix ACLs")
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If an RPC client is created without RPC_CLNT_CREATE_REUSEPORT, it should
not reuse the source port when a TCP connection is re-established.
This is currently implemented by preventing the source port being
recorded after a successful connection (the call to xs_set_srcport()).
However the source port is also recorded after a successful bind in xs_bind().
This may not be needed at all and certainly is not wanted when
RPC_CLNT_CREATE_REUSEPORT wasn't requested.
So avoid that assignment when xprt.reuseport is not set.
With this change, NFSv4.1 and later mounts use a different port number on
each connection. This is helpful with some firewalls which don't cope
well with port reuse.
Signed-off-by: NeilBrown <neilb@suse.de>
Fixes: e6237b6feb ("NFSv4.1: Don't rebind to the same source port when reconnecting to the server")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The variable status is being initialized with a value that is never
read, the assignment is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
This seems to happen fairly easily during READ_PLUS testing on NFS v4.2.
I found that we could end up accessing xdr->buf->pages[pgnr] with a pgnr
greater than the number of pages in the array. So let's just return
early if we're setting base to a point at the end of the page data and
let xdr_set_tail_base() handle setting up the buffer pointers instead.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Fixes: 8d86e373b0 ("SUNRPC: Clean up helpers xdr_set_iov() and xdr_set_page_base()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Fix an Oopsable condition in pnfs_mark_request_commit() when we're
putting a set of writes on the commit list to reschedule them after a
failed pNFS attempt.
Fixes: 9c455a8c1e ("NFS/pNFS: Clean up pNFS commit operations")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Set up the connection to the NFSv4 server in nfs4_alloc_client(), before
we've added the struct nfs_client to the net-namespace's nfs_client_list
so that a downed server won't cause other mounts to hang in the trunking
detection code.
Reported-by: Michael Wakabayashi <mwakabayashi@vmware.com>
Fixes: 5c6e5b60aa ("NFS: Fix an Oops in the pNFS files and flexfiles connection setup to the DS")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the NFSv4 client already holds a delegation for a file, then we can
support application leases (i.e. fcntl(fd, F_SETLEASE,...)) because the
underlying delegation guarantees that the file is not being modified on
the server by another client in a way that might conflict with the lease
guarantees.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When we add support for application level leases and knfsd delegations
to the NFS client, we we want to have them safely underpinned by a
"real" delegation to provide the caching guarantees. If that real
delegation is recalled, then we need to ensure that the application
leases/delegations are recalled too.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If we're unable to immediately recover all locks because the server is
unable to immediately service our reclaim calls, then we want to retry
after we've finished servicing all the other asynchronous delegation
returns on our queue.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>