Commit Graph

243 Commits

Author SHA1 Message Date
J. Bruce Fields 8c7424cff6 nfsd4: don't try to encode conflicting owner if low on space
I ran into this corner case in testing: in theory clients can provide
state owners up to 1024 bytes long.  In the sessions case there might be
a risk of this pushing us over the DRC slot size.

The conflicting owner isn't really that important, so let's humor a
client that provides a small maxresponsize_cached by allowing ourselves
to return without the conflicting owner instead of outright failing the
operation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:55 -04:00
J. Bruce Fields 2825a7f907 nfsd4: allow encoding across page boundaries
After this we can handle for example getattr of very large ACLs.

Read, readdir, readlink are still special cases with their own limits.

Also we can't handle a new operation starting close to the end of a
page.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:54 -04:00
J. Bruce Fields a8095f7e80 nfsd4: size-checking cleanup
Better variable name, some comments, etc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:53 -04:00
J. Bruce Fields ea8d7720b2 nfsd4: remove redundant encode buffer size checking
Now that all op encoders can handle running out of space, we no longer
need to check the remaining size for every operation; only nonidempotent
operations need that check, and that can be done by
nfsd4_check_resp_size.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:52 -04:00
J. Bruce Fields d0a381dd0e nfsd4: teach encoders to handle reserve_space failures
We've tried to prevent running out of space with COMPOUND_SLACK_SPACE
and special checking in those operations (getattr) whose result can vary
enormously.

However:
	- COMPOUND_SLACK_SPACE may be difficult to maintain as we add
	  more protocol.
	- BUG_ON or page faulting on failure seems overly fragile.
	- Especially in the 4.1 case, we prefer not to fail compounds
	  just because the returned result came *close* to session
	  limits.  (Though perfect enforcement here may be difficult.)
	- I'd prefer encoding to be uniform for all encoders instead of
	  having special exceptions for encoders containing, for
	  example, attributes.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:49 -04:00
J. Bruce Fields 6ac90391c6 nfsd4: keep xdr buf length updated
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-28 14:52:38 -04:00
J. Bruce Fields d3f627c815 nfsd4: use xdr_stream throughout compound encoding
Note this makes ADJUST_ARGS useless; we'll remove it in the following
patch.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-28 14:52:35 -04:00
J. Bruce Fields ddd1ea5636 nfsd4: use xdr_reserve_space in attribute encoding
This is a cosmetic change for now; no change in behavior.

Note we're just depending on xdr_reserve_space to do the bounds checking
for us, we're not really depending on its adjustment of iovec or xdr_buf
lengths yet, as those are fixed up by as necessary after the fact by
read-link operations and by nfs4svc_encode_compoundres.  However we do
have to update xdr->iov on read-like operations to prevent
xdr_reserve_space from messing with the already-fixed-up length of the
the head.

When the attribute encoding fails partway through we have to undo the
length adjustments made so far.  We do it manually for now, but later
patches will add an xdr_truncate_encode() helper to handle cases like
this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-28 14:52:34 -04:00
J. Bruce Fields 07d1f80207 nfsd4: fix encoding of out-of-space replies
If nfsd4_check_resp_size() returns an error then we should really be
truncating the reply here, otherwise we may leave extra garbage at the
end of the rpc reply.

Also add a warning to catch any cases where our reply-size estimates may
be wrong in the case of a non-idempotent operation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-27 11:09:08 -04:00
J. Bruce Fields 1802a67894 nfsd4: reserve head space for krb5 integ/priv info
Currently if the nfs-level part of a reply would be too large, we'll
return an error to the client.  But if the nfs-level part fits and
leaves no room for krb5p or krb5i stuff, then we just drop the request
entirely.

That's no good.  Instead, reserve some slack space at the end of the
buffer and make sure we fail outright if we'd come close.

The slack space here is a massive overstimate of what's required, we
should probably try for a tighter limit at some point.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:47 -04:00
J. Bruce Fields 2d124dfaad nfsd4: move proc_compound xdr encode init to helper
Mechanical transformation with no change of behavior.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:46 -04:00
J. Bruce Fields d518465866 nfsd4: tweak nfsd4_encode_getattr to take xdr_stream
Just change the nfsd4_encode_getattr api.  Not changing any code or
adding any new functionality yet.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:46 -04:00
J. Bruce Fields 4aea24b2ff nfsd4: embed xdr_stream in nfsd4_compoundres
This is a mechanical transformation with no change in behavior.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:45 -04:00
J. Bruce Fields e372ba60de nfsd4: decoding errors can still be cached and require space
Currently a non-idempotent op reply may be cached if it fails in the
proc code but not if it fails at xdr decoding.  I doubt there are any
xdr-decoding-time errors that would make this a problem in practice, so
this probably isn't a serious bug.

The space estimates should also take into account space required for
encoding of error returns.  Again, not a practical problem, though it
would become one after future patches which will tighten the space
estimates.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:44 -04:00
J. Bruce Fields f34e432b67 nfsd4: fix write reply size estimate
The write reply also includes count and stable_how.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:43 -04:00
J. Bruce Fields 622f560e6a nfsd4: read size estimate should include padding
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:42 -04:00
J. Bruce Fields 5b648699af nfsd4: READ, READDIR, etc., are idempotent
OP_MODIFIES_SOMETHING flags operations that we should be careful not to
initiate without being sure we have the buffer space to encode a reply.

None of these ops fall into that category.

We could probably remove a few more, but this isn't a very important
problem at least for ops whose reply size is easy to estimate.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:41 -04:00
Trond Myklebust 14bcab1a39 NFSd: Clean up nfs4_preprocess_stateid_op
Move the state locking and file descriptor reference out from the
callers and into nfs4_preprocess_stateid_op() itself.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-07 11:05:48 -04:00
Kinglong Mee 2336745e87 NFSD: Clear wcc data between compound ops
Testing NFS4.0 by pynfs, I got some messeages as,
"nfsd: inode locked twice during operation."

When one compound RPC contains two or more ops that locks
the filehandle,the second op will cause the message.

As two SETATTR ops, after the first SETATTR, nfsd will not call
fh_put() to release current filehandle, it means filehandle have
unlocked with fh_post_saved = 1.
The second SETATTR find fh_post_saved = 1, and printk the message.

v2: introduce helper fh_clear_wcc().

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-30 10:47:34 -04:00
J. Bruce Fields 480efaee08 nfsd4: fix setclientid encode size
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 21:24:55 -04:00
Kinglong Mee 4daeed25ad NFSD: simplify saved/current fh uses in nfsd4_proc_compound
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 18:01:40 -04:00
J. Bruce Fields 4c69d5855a nfsd4: session needs room for following op to error out
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27 16:30:59 -04:00
Linus Torvalds d9894c228b Merge branch 'for-3.14' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 - Handle some loose ends from the vfs read delegation support.
   (For example nfsd can stop breaking leases on its own in a
    fewer places where it can now depend on the vfs to.)
 - Make life a little easier for NFSv4-only configurations
   (thanks to Kinglong Mee).
 - Fix some gss-proxy problems (thanks Jeff Layton).
 - miscellaneous bug fixes and cleanup

* 'for-3.14' of git://linux-nfs.org/~bfields/linux: (38 commits)
  nfsd: consider CLAIM_FH when handing out delegation
  nfsd4: fix delegation-unlink/rename race
  nfsd4: delay setting current_fh in open
  nfsd4: minor nfs4_setlease cleanup
  gss_krb5: use lcm from kernel lib
  nfsd4: decrease nfsd4_encode_fattr stack usage
  nfsd: fix encode_entryplus_baggage stack usage
  nfsd4: simplify xdr encoding of nfsv4 names
  nfsd4: encode_rdattr_error cleanup
  nfsd4: nfsd4_encode_fattr cleanup
  minor svcauth_gss.c cleanup
  nfsd4: better VERIFY comment
  nfsd4: break only delegations when appropriate
  NFSD: Fix a memory leak in nfsd4_create_session
  sunrpc: get rid of use_gssp_lock
  sunrpc: fix potential race between setting use_gss_proxy and the upcall rpc_clnt
  sunrpc: don't wait for write before allowing reads from use-gss-proxy file
  nfsd: get rid of unused function definition
  Define op_iattr for nfsd4_open instead using macro
  NFSD: fix compile warning without CONFIG_NFSD_V3
  ...
2014-01-30 10:18:43 -08:00
J. Bruce Fields 4335723e8e nfsd4: fix delegation-unlink/rename race
If a file is unlinked or renamed between the time when we do the local
open and the time when we get the delegation, then we will return to the
client indicating that it holds a delegation even though the file no
longer exists under the name it was open under.

But a client performing an open-by-name, when it is returned a
delegation, must be able to assume that the file is still linked at the
name it was opened under.

So, hold the parent i_mutex for longer to prevent concurrent renames or
unlinks.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-27 13:59:16 -05:00
J. Bruce Fields c0e6bee480 nfsd4: delay setting current_fh in open
This is basically a no-op, to simplify a following patch.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-27 13:59:16 -05:00
Christoph Hellwig 4ac7249ea5 nfsd: use get_acl and ->set_acl
Remove the boilerplate code to marshall and unmarhall ACL objects into
xattrs and operate on the posix_acl objects directly.  Also move all
the ACL handling code into nfs?acl.c where it belongs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-26 08:26:41 -05:00
J. Bruce Fields 41ae6e714a nfsd4: better VERIFY comment
This confuses me every time.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-07 16:01:15 -05:00
Kinglong Mee 7e55b59b2f SUNRPC/NFSD: Support a new option for ignoring the result of svc_register
NFSv4 clients can contact port 2049 directly instead of needing the
portmapper.

Therefore a failure to register to the portmapper when starting an
NFSv4-only server isn't really a problem.

But Gareth Williams reports that an attempt to start an NFSv4-only
server without starting portmap fails:

  #rpc.nfsd -N 2 -N 3
  rpc.nfsd: writing fd to kernel failed: errno 111 (Connection refused)
  rpc.nfsd: unable to set any sockets for nfsd

Add a flag to svc_version to tell the rpc layer it can safely ignore an
rpcbind failure in the NFSv4-only case.

Reported-by: Gareth Williams <gareth@garethwilliams.me.uk>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-03 18:18:49 -05:00
Kinglong Mee a8bb84bc9e nfsd: calculate the missing length of bitmap in EXCHANGE_ID
commit 58cd57bfd9
"nfsd: Fix SP4_MACH_CRED negotiation in EXCHANGE_ID"
miss calculating the length of bitmap for spo_must_enforce and spo_must_allow.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-02 17:44:52 -05:00
Weston Andros Adamson 58cd57bfd9 nfsd: Fix SP4_MACH_CRED negotiation in EXCHANGE_ID
- don't BUG_ON() when not SP4_NONE
 - calculate recv and send reserve sizes correctly

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-07 12:06:07 -04:00
J. Bruce Fields 35f7a14fc1 nfsd4: fix minorversion support interface
You can turn on or off support for minorversions using e.g.

	echo "-4.2" >/proc/fs/nfsd/versions

However, the current implementation is a little wonky.  For example, the
above will turn off 4.2 support, but it will also turn *on* 4.1 support.

This didn't matter as long as we only had 2 minorversions, which was
true till very recently.

And do a little cleanup here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-12 16:48:52 -04:00
J. Bruce Fields 89f6c3362c nfsd4: delegation-based open reclaims should bypass permissions
We saw a v4.0 client's create fail as follows:

	- open create succeeds and gets a read delegation
	- client attempts to set mode on new file, gets DELAY while
	  server recalls delegation.
	- client attempts a CLAIM_DELEGATE_CUR open using the
	  delegation, gets error because of new file mode.

This probably can't happen on a recent kernel since we're no longer
giving out delegations on create opens.  Nevertheless, it's a
bug--reclaim opens should bypass permission checks.

Reported-by: Steve Dickson <steved@redhat.com>
Reported-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:32:05 -04:00
David Quigley 18032ca062 NFSD: Server implementation of MAC Labeling
Implement labeled NFS on the server: encoding and decoding, and writing
and reading, of file labels.

Enabled with CONFIG_NFSD_V4_SECURITY_LABEL.

Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-05-15 09:27:02 -04:00
J. Bruce Fields 9f415eb255 nfsd4: don't allow owner override on 4.1 CLAIM_FH opens
The Linux client is using CLAIM_FH to implement regular opens, not just
recovery cases, so it depends on the server to check permissions
correctly.

Therefore the owner override, which may make sense in the delegation
recovery case, isn't right in the CLAIM_FH case.

Symptoms: on a client with 49f9a0fafd
"NFSv4.1: Enable open-by-filehandle", Bryan noticed this:

	touch test.txt
	chmod 000 test.txt
	echo test > test.txt

succeeding.

Cc: stable@kernel.org
Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-05-03 16:36:14 -04:00
J. Bruce Fields 2a6cf944c2 nfsd4: don't remap EISDIR errors in rename
We're going out of our way here to remap an error to make rfc 3530
happy--but the rfc itself (nor rfc 1813, which has similar language)
gives no justification.  And disagrees with local filesystem behavior,
with Linux and posix man pages, and knfsd's implemented behavior for v2
and v3.

And the documented behavior seems better, in that it gives a little more
information--you could implement the 3530 behavior using the posix
behavior, but not the other way around.

Also, the Linux client makes no attempt to remap this error in the v4
case, so it can end up just returning EEXIST to the application in a
case where it should return EISDIR.

So honestly I think the rfc's are just buggy here--or in any case it
doesn't see worth the trouble to remap this error.

Reported-by: Frank S Filz <ffilz@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-30 15:44:20 -04:00
J. Bruce Fields bbc9c36c31 nfsd4: more sessions/open-owner-replay cleanup
More logic that's unnecessary in the 4.1 case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 09:08:56 -04:00
J. Bruce Fields 3d74e6a5b6 nfsd4: no need for replay_owner in sessions case
The replay_owner will never be used in the sessions case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 09:08:55 -04:00
J. Bruce Fields 9411b1d4c7 nfsd4: cleanup handling of nfsv4.0 closed stateid's
Closed stateid's are kept around a little while to handle close replays
in the 4.0 case.  So we stash them in the last-used stateid in the
oo_last_closed_stateid field of the open owner.  We can free that in
encode_seqid_op_tail once the seqid on the open owner is next
incremented.  But we don't want to do that on the close itself; so we
set NFS4_OO_PURGE_CLOSE flag set on the open owner, skip freeing it the
first time through encode_seqid_op_tail, then when we see that flag set
next time we free it.

This is unnecessarily baroque.

Instead, just move the logic that increments the seqid out of the xdr
code and into the operation code itself.

The justification given for the current placement is that we need to
wait till the last minute to be sure we know whether the status is a
sequence-id-mutating error or not, but examination of the code shows
that can't actually happen.

Reported-by: Yanchuan Nian <ycnian@gmail.com>
Tested-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-08 09:55:32 -04:00
fanchaoting b022032e19 nfsd: don't run get_file if nfs4_preprocess_stateid_op return error
we should return error status directly when nfs4_preprocess_stateid_op
return error.

Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 15:19:06 -04:00
J. Bruce Fields 9d313b17db nfsd4: handle seqid-mutating open errors from xdr decoding
If a client sets an owner (or group_owner or acl) attribute on open for
create, and the mapping of that owner to an id fails, then we return
BAD_OWNER.  But BAD_OWNER is a seqid-mutating error, so we can't
shortcut the open processing that case: we have to at least look up the
owner so we can find the seqid to bump.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:47:53 -04:00
J. Bruce Fields b600de7ab9 nfsd4: remove BUG_ON
This BUG_ON just crashes the thread a little earlier than it would
otherwise--it doesn't seem useful.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:47:45 -04:00
J. Bruce Fields 84822d0b3b nfsd4: simplify nfsd4_encode_fattr interface slightly
It seems slightly simpler to make nfsd4_encode_fattr rather than its
callers responsible for advancing the write pointer on success.

(Also: the count == 0 check in the verify case looks superfluous.
Running out of buffer space is really the only reason fattr encoding
should fail with eresource.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-23 18:17:35 -05:00
J. Bruce Fields a1dc695582 nfsd4: free_stateid can use the current stateid
Cc: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 22:00:27 -05:00
J. Bruce Fields 9b3234b922 nfsd4: disable zero-copy on non-final read ops
To ensure ordering of read data with any following operations, turn off
zero copy if the read is not the final operation in the compound.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 16:02:41 -05:00
Stanislav Kinsbursky b9c0ef8571 nfsd: make NFSd service boot time per-net
This is simple: an NFSd service can be started at different times in
different network environments. So, its "boot time" has to be assigned
per net.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:38 -05:00
Neil Brown 7007c90fb9 nfsd: avoid permission checks on EXCLUSIVE_CREATE replay
With NFSv4, if we create a file then open it we explicit avoid checking
the permissions on the file during the open because the fact that we
created it ensures we should be allow to open it (the create and the
open should appear to be a single operation).

However if the reply to an EXCLUSIVE create gets lots and the client
resends the create, the current code will perform the permission check -
because it doesn't realise that it did the open already..

This patch should fix this.

Note that I haven't actually seen this cause a problem.  I was just
looking at the code trying to figure out a different EXCLUSIVE open
related issue, and this looked wrong.

(Fix confirmed with pynfs 4.0 test OPEN4--bfields)

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
[bfields: use OWNER_OVERRIDE and update for 4.1]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:31 -05:00
J. Bruce Fields ffe1137ba7 nfsd4: delay filling in write iovec array till after xdr decoding
Our server rejects compounds containing more than one write operation.
It's unclear whether this is really permitted by the spec; with 4.0,
it's possibly OK, with 4.1 (which has clearer limits on compound
parameters), it's probably not OK.  No client that we're aware of has
ever done this, but in theory it could be useful.

The source of the limitation: we need an array of iovecs to pass to the
write operation.  In the worst case that array of iovecs could have
hundreds of elements (the maximum rwsize divided by the page size), so
it's too big to put on the stack, or in each compound op.  So we instead
keep a single such array in the compound argument.

We fill in that array at the time we decode the xdr operation.

But we decode every op in the compound before executing any of them.  So
once we've used that array we can't decode another write.

If we instead delay filling in that array till the time we actually
perform the write, we can reuse it.

Another option might be to switch to decoding compound ops one at a
time.  I considered doing that, but it has a number of other side
effects, and I'd rather fix just this one problem for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:08:15 -05:00
Stanislav Kinsbursky 3320fef19b nfsd: use service net instead of hard-coded init_net
This patch replaces init_net by SVC_NET(), where possible and also passes
proper context to nested functions where required.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:50 -05:00
J. Bruce Fields cb73a9f464 nfsd4: implement backchannel_ctl operation
This operation is mandatory for servers to implement.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:39:58 -05:00
J. Bruce Fields d15c077e44 nfsd4: enforce per-client sessions/no-sessions distinction
Something like creating a client with setclientid and then trying to
confirm it with create_session may not crash the server, but I'm not
completely positive of that, and in any case it's obviously bad client
behavior.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-01 17:39:58 -04:00
Bryan Schumaker 24ff99c6fe NFSD: Swap the struct nfs4_operation getter and setter
stateid_setter should be matched to op_set_currentstateid, rather than
op_get_currentstateid.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:53:25 -04:00
Stanislav Kinsbursky 5ccb0066f2 LockD: pass actual network namespace to grace period management functions
Passed network namespace replaced hard-coded init_net

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:22 -04:00
Linus Torvalds c6f5c93098 Merge branch 'for-3.4' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from J. Bruce Fields:
 "One bugfix, and one minor header fix from Jeff Layton while we're
  here"

* 'for-3.4' of git://linux-nfs.org/~bfields/linux:
  nfsd: include cld.h in the headers_install target
  nfsd: don't fail unchecked creates of non-special files
2012-04-19 14:54:52 -07:00
Al Viro 96f6f98501 nfsd: fix b0rken error value for setattr on read-only mount
..._want_write() returns -EROFS on failure, _not_ an NFS error value.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 10:12:00 -04:00
J. Bruce Fields 9dc4e6c4d1 nfsd: don't fail unchecked creates of non-special files
Allow a v3 unchecked open of a non-regular file succeed as if it were a
lookup; typically a client in such a case will want to fall back on a
local open, so succeeding and giving it the filehandle is more useful
than failing with nfserr_exist, which makes it appear that nothing at
all exists by that name.

Similarly for v4, on an open-create, return the same errors we would on
an attempt to open a non-regular file, instead of returning
nfserr_exist.

This fixes a problem found doing a v4 open of a symlink with
O_RDONLY|O_CREAT, which resulted in the current client returning EEXIST.

Thanks also to Trond for analysis.

Cc: stable@kernel.org
Reported-by: Orion Poplawski <orion@cora.nwra.com>
Tested-by: Orion Poplawski <orion@cora.nwra.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-11 17:49:52 -04:00
Jeff Layton a52d726bbd nfsd: convert nfs4_client->cl_cb_flags to a generic flags field
We'll need a way to flag the nfs4_client as already being recorded on
stable storage so that we don't continually upcall. Currently, that's
recorded in the cl_firststate field of the client struct. Using an
entire u32 to store a flag is rather wasteful though.

The cl_cb_flags field is only using 2 bits right now, so repurpose that
to a generic flags field. Rename NFSD4_CLIENT_KILL to
NFSD4_CLIENT_CB_KILL to make it evident that it's part of the callback
flags. Add a mask that we can use for existing checks that look to see
whether any flags are set, so that the new flags don't interfere.

Convert all references to cl_firstate to the NFSD4_CLIENT_STABLE flag,
and add a new NFSD4_CLIENT_RECLAIM_COMPLETE flag.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-26 11:49:47 -04:00
Chuck Lever ab4684d156 NFSD: Fix nfs4_verifier memory alignment
Clean up due to code review.

The nfs4_verifier's data field is not guaranteed to be u32-aligned.
Casting an array of chars to a u32 * is considered generally
hazardous.

We can fix most of this by using a __be32 array to generate the
verifier's contents and then byte-copying it into the verifier field.

However, there is one spot where there is a backwards compatibility
constraint: the do_nfsd_create() call expects a verifier which is
32-bit aligned.  Fix this spot by forcing the alignment of the create
verifier in the nfsd4_open args structure.

Also, sizeof(nfs4_verifer) is the size of the in-core verifier data
structure, but NFS4_VERIFIER_SIZE is the number of octets in an XDR'd
verifier.  The two are not interchangeable, even if they happen to
have the same value.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-20 15:36:15 -04:00
Trond Myklebust 8f199b8262 NFSD: Fix warnings when NFSD_DEBUG is not defined
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-20 15:34:19 -04:00
J. Bruce Fields 59deeb9e5a nfsd4: reduce do_open_lookup() stack usage
I get 320 bytes for struct svc_fh on x86_64, really a little large to be
putting on the stack; kmalloc() instead.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-06 18:13:37 -05:00
J. Bruce Fields 41fd1e42f8 nfsd4: delay setting current filehandle till success
Compound processing stops on error, so the current filehandle won't be
used on error.  Thus the order here doesn't really matter.  It'll be
more convenient to do it later, though.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-06 18:13:36 -05:00
Benny Halevy 2c8bd7e0d1 nfsd41: split out share_access want and signal flags while decoding
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-17 18:38:42 -05:00
Tigran Mkrtchyan 37c593c573 nfsd41: use current stateid by value
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15 11:20:45 -05:00
Tigran Mkrtchyan 9428fe1abb nfsd41: consume current stateid on DELEGRETURN and OPENDOWNGRADE
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15 11:20:44 -05:00
Tigran Mkrtchyan 1e97b5190d nfsd41: handle current stateid in SETATTR and FREE_STATEID
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15 11:20:43 -05:00
Tigran Mkrtchyan d14710532f nfsd41: mark LOOKUP, LOOKUPP and CREATE to invalidate current stateid
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15 11:20:42 -05:00
Tigran Mkrtchyan 8307111476 nfsd41: save and restore current stateid with current fh
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15 11:20:41 -05:00
Tigran Mkrtchyan 80e01cc1e2 nfsd41: mark PUTFH, PUTPUBFH and PUTROOTFH to clear current stateid
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15 11:20:41 -05:00
Tigran Mkrtchyan 30813e2773 nfsd41: consume current stateid on read and write
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15 11:20:40 -05:00
Tigran Mkrtchyan 62cd4a591c nfsd41: handle current stateid on lock and locku
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15 11:20:39 -05:00
Tigran Mkrtchyan 8b70484c67 nfsd41: handle current stateid in open and close
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15 11:20:38 -05:00
Linus Torvalds 0b48d42235 Merge branch 'for-3.3' of git://linux-nfs.org/~bfields/linux
* 'for-3.3' of git://linux-nfs.org/~bfields/linux: (31 commits)
  nfsd4: nfsd4_create_clid_dir return value is unused
  NFSD: Change name of extended attribute containing junction
  svcrpc: don't revert to SVC_POOL_DEFAULT on nfsd shutdown
  svcrpc: fix double-free on shutdown of nfsd after changing pool mode
  nfsd4: be forgiving in the absence of the recovery directory
  nfsd4: fix spurious 4.1 post-reboot failures
  NFSD: forget_delegations should use list_for_each_entry_safe
  NFSD: Only reinitilize the recall_lru list under the recall lock
  nfsd4: initialize special stateid's at compile time
  NFSd: use network-namespace-aware cache registering routines
  SUNRPC: create svc_xprt in proper network namespace
  svcrpc: update outdated BKL comment
  nfsd41: allow non-reclaim open-by-fh's in 4.1
  svcrpc: avoid memory-corruption on pool shutdown
  svcrpc: destroy server sockets all at once
  svcrpc: make svc_delete_xprt static
  nfsd: Fix oops when parsing a 0 length export
  nfsd4: Use kmemdup rather than duplicating its implementation
  nfsd4: add a separate (lockowner, inode) lookup
  nfsd4: fix CONFIG_NFSD_FAULT_INJECTION compile error
  ...
2012-01-14 12:26:41 -08:00
Al Viro bad0dcffc2 new helpers: fh_{want,drop}_write()
A bunch of places in nfsd does mnt_{want,drop}_write on vfsmount of
export of given fhandle.  Switched to obvious inlined helpers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:35 -05:00
Mi Jinlong 0cf99b91c6 nfsd41: allow non-reclaim open-by-fh's in 4.1
With NFSv4.0 it was safe to assume that open-by-filehandles were always
reclaims.

With NFSv4.1 there are non-reclaim open-by-filehandle operations, so we
should ensure we're only insisting on reclaims in the
OPEN_CLAIM_PREVIOUS case.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-12-06 16:19:04 -05:00
Mi Jinlong 345c284290 nfs41: implement DESTROY_CLIENTID operation
According to rfc5661 18.50, implement DESTROY_CLIENTID operation.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-24 04:24:30 -04:00
J. Bruce Fields 8b289b2c23 nfsd4: implement new 4.1 open reclaim types
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-19 11:52:12 -04:00
J. Bruce Fields 856121b2e8 nfsd4: warn on open failure after create
If we create the object and then return failure to the client, we're
left with an unexpected file in the filesystem.

I'm trying to eliminate such cases but not 100% sure I have so an
assertion might be helpful for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-17 17:50:08 -04:00
J. Bruce Fields d29b20cd58 nfsd4: clean up open owners on OPEN failure
If process_open1() creates a new open owner, but the open later fails,
the current code will leave the open owner around.  It won't be on the
close_lru list, and the client isn't expected to send a CLOSE, so it
will hang around as long as the client does.

Similarly, if process_open1() removes an existing open owner from the
close lru, anticipating that an open owner that previously had no
associated stateid's now will, but the open subsequently fails, then
we'll again be left with the same leak.

Fix both problems.

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-17 17:33:57 -04:00
J. Bruce Fields b6d2f1ca3c nfsd4: more robust ignoring of WANT bits in OPEN
Mask out the WANT bits right at the start instead of on each use.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-11 12:15:15 -04:00
J. Bruce Fields c856694e3d nfsd4: make op_cacheresult another flag
I'm not sure why I used a new field for this originally.

Also, the differences between some of these flags are a little subtle;
add some comments to explain.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-20 14:45:51 -04:00
J. Bruce Fields dad1c067eb nfsd4: replace oo_confirmed by flag bit
I want at least one more bit here.  So, let's haul out the caps lock key
and add a flags field.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-16 17:44:16 -04:00
Mi Jinlong 58e7b33a58 nfsd41: try to check reply size before operation
For checking the size of reply before calling a operation,
we need try to get maxsize of the operation's reply.

v3: using new method as Bruce said,

 "we could handle operations in two different ways:

	- For operations that actually change something (write, rename,
	  open, close, ...), do it the way we're doing it now: be
	  very careful to estimate the size of the response before even
	  processing the operation.
	- For operations that don't change anything (read, getattr, ...)
	  just go ahead and do the operation.  If you realize after the
	  fact that the response is too large, then return the error at
	  that point.

  So we'd add another flag to op_flags: say, OP_MODIFIES_SOMETHING.  And for
  operations with OP_MODIFIES_SOMETHING set, we'd do the first thing.  For
  operations without it set, we'd do the second."

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
[bfields@redhat.com: crash, don't attempt to handle, undefined op_rsize_bop]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-16 10:31:01 -04:00
J. Bruce Fields fe0750e5c4 nfsd4: split stateowners into open and lockowners
The stateowner has some fields that only make sense for openowners, and
some that only make sense for lockowners, and I find it a lot clearer if
those are separated out.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-07 09:45:49 -04:00
J. Bruce Fields 7c13f344cf nfsd4: drop most stateowner refcounting
Maybe we'll bring it back some day, but we don't have much real use for
it now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-01 11:12:47 -04:00
J. Bruce Fields 5ec094c109 nfsd4: extend state lock over seqid replay logic
There are currently a couple races in the seqid replay code: a
retransmission could come while we're still encoding the original reply,
or a new seqid-mutating call could come as we're encoding a replay.

So, extend the state lock over the encoding (both encoding of a replayed
reply and caching of the original encoded reply).

I really hate doing this, and previously added the stateowner
reference-counting code to avoid it (which was insufficient)--but I
don't see a less complicated alternative at the moment.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-01 07:07:59 -04:00
J. Bruce Fields 3e77246393 nfsd4: stop using nfserr_resource for transitory errors
The server is returning nfserr_resource for both permanent errors and
for errors (like allocation failures) that might be resolved by retrying
later.  Save nfserr_resource for the former and use delay/jukebox for
the latter.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27 14:21:21 -04:00
J. Bruce Fields a043226bc1 nfsd4: permit read opens of executable-only files
A client that wants to execute a file must be able to read it.  Read
opens over nfs are therefore implicitly allowed for executable files
even when those files are not readable.

NFSv2/v3 get this right by using a passed-in NFSD_MAY_OWNER_OVERRIDE on
read requests, but NFSv4 has gotten this wrong ever since
dc730e1737 "nfsd4: fix owner-override on
open", when we realized that the file owner shouldn't override
permissions on non-reclaim NFSv4 opens.

So we can't use NFSD_MAY_OWNER_OVERRIDE to tell nfsd_permission to allow
reads of executable files.

So, do the same thing we do whenever we encounter another weird NFS
permission nit: define yet another NFSD_MAY_* flag.

The industry's future standardization on 128-bit processors will be
motivated primarily by the need for integers with enough bits for all
the NFSD_MAY_* flags.

Reported-by: Leonardo Borda <leonardoborda@gmail.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27 14:20:20 -04:00
J. Bruce Fields 75c096f753 nfsd4: it's OK to return nfserr_symlink
The nfsd4 code has a bunch of special exceptions for error returns which
map nfserr_symlink to other errors.

In fact, the spec makes it clear that nfserr_symlink is to be preferred
over less specific errors where possible.

The patch that introduced it back in 2.6.4 is "kNFSd: correct symlink
related error returns.", which claims that these special exceptions are
represent an NFSv4 break from v2/v3 tradition--when in fact the symlink
error was introduced with v4.

I suspect what happened was pynfs tests were written that were overly
faithful to the (known-incomplete) rfc3530 error return lists, and then
code was fixed up mindlessly to make the tests pass.

Delete these unnecessary exceptions.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26 18:22:50 -04:00
J. Bruce Fields aadab6c6f4 nfsd4: return nfserr_symlink on v4 OPEN of non-regular file
Without this, an attempt to open a device special file without first
stat'ing it will fail.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19 13:25:32 -04:00
Bernd Schubert 832023bffb nfsd4: Remove check for a 32-bit cookie in nfsd4_readdir()
Fan Yong <yong.fan@whamcloud.com> noticed setting
FMODE_32bithash wouldn't work with nfsd v4, as
nfsd4_readdir() checks for 32 bit cookies. However, according to RFC 3530
cookies have a 64 bit type and cookies are also defined as u64 in
'struct nfsd4_readdir'. So remove the test for >32-bit values.

Cc: stable@kernel.org
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-16 15:19:28 -04:00
J. Bruce Fields 1091006c5e nfsd: turn on reply cache for NFSv4
It's sort of ridiculous that we've never had a working reply cache for
NFSv4.

On the other hand, we may still not: our current reply cache is likely
not very good, especially in the TCP case (which is the only case that
matters for v4).  What we really need here is some serious testing.

Anyway, here's a start.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-07-18 09:39:01 -04:00
J. Bruce Fields 3e98abffd1 nfsd4: call nfsd4_release_compoundargs from pc_release
This simplifies cleanup a bit.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-07-18 09:38:02 -04:00
Mi Jinlong ab1350b2b3 nfsd41: Deny new lock before RECLAIM_COMPLETE done
Before nfs41 client's RECLAIM_COMPLETE done, nfs server should deny any
new locks or opens.

rfc5661:

   " Whenever a client establishes a new client ID and before it does
   the first non-reclaim operation that obtains a lock, it MUST send a
   RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no
   locks to reclaim.  If non-reclaim locking operations are done before
   the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. "

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-07-15 19:00:40 -04:00
Bryan Schumaker 1745680454 NFSD: Added TEST_STATEID operation
This operation is used by the client to check the validity of a list of
stateids.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-07-15 18:58:48 -04:00
Bryan Schumaker e1ca12dfb1 NFSD: added FREE_STATEID operation
This operation is used by the client to tell the server to free a
stateid.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-07-15 18:58:47 -04:00
Benny Halevy 094b5d74f4 NFSD: allow OP_DESTROY_CLIENTID to be only op in COMPOUND
DESTROY_CLIENTID MAY be preceded with a SEQUENCE operation as long as
   the client ID derived from the session ID of SEQUENCE is not the same
   as the client ID to be destroyed.  If the client IDs are the same,
   then the server MUST return NFS4ERR_CLIENTID_BUSY.

(that's not implemented yet)

   If DESTROY_CLIENTID is not prefixed by SEQUENCE, it MUST be the only
   operation in the COMPOUND request (otherwise, the server MUST return
   NFS4ERR_NOT_ONLY_OP).

This fixes the error return; before, we returned
NFS4ERR_OP_NOT_IN_SESSION; after this patch, we return NFS4ERR_NOTSUPP.

Signed-off-by: Benny Halevy <benny@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-07-15 18:58:41 -04:00
Mi Jinlong ac6721a13e nfsd41: make sure nfs server process OPEN with EXCLUSIVE4_1 correctly
The NFS server uses nfsd_create_v3 to handle EXCLUSIVE4_1 opens, but
that function is not prepared to handle them.

Rename nfsd_create_v3() to do_nfsd_create(), and add handling of
EXCLUSIVE4_1.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-29 20:47:52 -04:00
J. Bruce Fields 68d9318435 nfsd4: fix wrongsec handling for PUTFH + op cases
When PUTFH is followed by an operation that uses the filehandle, and
when the current client is using a security flavor that is inconsistent
with the given filehandle, we have a choice: we can return WRONGSEC
either when the current filehandle is set using the PUTFH, or when the
filehandle is first used by the following operation.

Follow the recommendations of RFC 5661 in making this choice.

(Our current behavior prevented the client from doing security
negotiation by returning WRONGSEC on PUTFH+SECINFO_NO_NAME.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-29 20:47:51 -04:00
J. Bruce Fields 29a78a3ed7 nfsd4: make fh_verify responsibility of nfsd_lookup_dentry caller
The secinfo caller actually won't want this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-11 08:42:22 -04:00
J. Bruce Fields 22b0321496 nfsd4: introduce OPDESC helper
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-11 08:42:21 -04:00
Mi Jinlong 5ece3cafbd nfsd41: modify the members value of nfsd4_op_flags
The members of nfsd4_op_flags, (ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS)
equals to  ALLOWED_AS_FIRST_OP, maybe that's not what we want.

OP_PUTROOTFH with op_flags = ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS,
can't appears as the first operation with out SEQUENCE ops.

This patch modify the wrong value of ALLOWED_WITHOUT_FH etc which
was introduced by f9bb94c4.

Cc: stable@kernel.org
Reviewed-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-07 12:10:33 -05:00