Commit Graph

263 Commits

Author SHA1 Message Date
J. Bruce Fields 476a7b1f4b nfsd4: don't treat readlink like a zero-copy operation
There's no advantage to this zero-copy-style readlink encoding, and it
unnecessarily limits the kinds of compounds we can handle.  (In practice
I can't see why a client would want e.g. multiple readlink calls in a
comound, but it's probably a spec violation for us not to handle it.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:05 -04:00
J. Bruce Fields 3b29970909 nfsd4: enforce rd_dircount
As long as we're here, let's enforce the protocol's limit on the number
of directory entries to return in a readdir.

I don't think anyone's ever noticed our lack of enforcement, but maybe
there's more of a chance they will now that we allow larger readdirs.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:04 -04:00
J. Bruce Fields 561f0ed498 nfsd4: allow large readdirs
Currently we limit readdir results to a single page.  This can result in
a performance regression compared to NFSv3 when reading large
directories.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:03 -04:00
J. Bruce Fields 47ee529864 nfsd4: adjust buflen to session channel limit
We can simplify session limit enforcement by restricting the xdr buflen
to the session size.

Also fix a preexisting bug: we should really have been taking into
account the auth-required space when comparing against session limits,
which are limits on the size of the entire rpc reply, including any krb5
overhead.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:02 -04:00
J. Bruce Fields 30596768b3 nfsd4: fix buflen calculation after read encoding
We don't necessarily want to assume that the buflen is the same
as the number of bytes available in the pages.  We may have some reason
to set it to something less (for example, later patches will use a
smaller buflen to enforce session limits).

So, calculate the buflen relative to the previous buflen instead of
recalculating it from scratch.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:00 -04:00
J. Bruce Fields 89ff884ebb nfsd4: nfsd4_check_resp_size should check against whole buffer
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:59 -04:00
J. Bruce Fields 6ff9897d2b nfsd4: minor encode_read cleanup
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:58 -04:00
J. Bruce Fields 4f0cefbf38 nfsd4: more precise nfsd4_max_reply
It will turn out to be useful to have a more accurate estimate of reply
size; so, piggyback on the existing op reply-size estimators.

Also move nfsd4_max_reply to nfs4proc.c to get easier access to struct
nfsd4_operation and friends.  (Thanks to Christoph Hellwig for pointing
out that simplification.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:57 -04:00
J. Bruce Fields 8c7424cff6 nfsd4: don't try to encode conflicting owner if low on space
I ran into this corner case in testing: in theory clients can provide
state owners up to 1024 bytes long.  In the sessions case there might be
a risk of this pushing us over the DRC slot size.

The conflicting owner isn't really that important, so let's humor a
client that provides a small maxresponsize_cached by allowing ourselves
to return without the conflicting owner instead of outright failing the
operation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:55 -04:00
J. Bruce Fields f5236013a2 nfsd4: convert 4.1 replay encoding
Limits on maxresp_sz mean that we only ever need to replay rpc's that
are contained entirely in the head.

The one exception is very small zero-copy reads.  That's an odd corner
case as clients wouldn't normally ask those to be cached.

in any case, this seems a little more robust.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:55 -04:00
J. Bruce Fields 2825a7f907 nfsd4: allow encoding across page boundaries
After this we can handle for example getattr of very large ACLs.

Read, readdir, readlink are still special cases with their own limits.

Also we can't handle a new operation starting close to the end of a
page.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:54 -04:00
J. Bruce Fields a8095f7e80 nfsd4: size-checking cleanup
Better variable name, some comments, etc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:53 -04:00
J. Bruce Fields ea8d7720b2 nfsd4: remove redundant encode buffer size checking
Now that all op encoders can handle running out of space, we no longer
need to check the remaining size for every operation; only nonidempotent
operations need that check, and that can be done by
nfsd4_check_resp_size.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:52 -04:00
J. Bruce Fields 67492c9903 nfsd4: nfsd4_check_resp_size needn't recalculate length
We're keeping the length updated as we go now, so there's no need for
the extra calculation here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:51 -04:00
J. Bruce Fields 4e21ac4b6f nfsd4: reserve space before inlining 0-copy pages
Once we've included page-cache pages in the encoding it's difficult to
remove them and restart encoding.  (xdr_truncate_encode doesn't handle
that case.)  So, make sure we'll have adequate space to finish the
operation first.

For now COMPOUND_SLACK_SPACE checks should prevent this case happening,
but we want to remove those checks.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:50 -04:00
J. Bruce Fields d0a381dd0e nfsd4: teach encoders to handle reserve_space failures
We've tried to prevent running out of space with COMPOUND_SLACK_SPACE
and special checking in those operations (getattr) whose result can vary
enormously.

However:
	- COMPOUND_SLACK_SPACE may be difficult to maintain as we add
	  more protocol.
	- BUG_ON or page faulting on failure seems overly fragile.
	- Especially in the 4.1 case, we prefer not to fail compounds
	  just because the returned result came *close* to session
	  limits.  (Though perfect enforcement here may be difficult.)
	- I'd prefer encoding to be uniform for all encoders instead of
	  having special exceptions for encoders containing, for
	  example, attributes.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:49 -04:00
J. Bruce Fields 082d4bd72a nfsd4: "backfill" using write_bytes_to_xdr_buf
Normally xdr encoding proceeds in a single pass from start of a buffer
to end, but sometimes we have to write a few bytes to an earlier
position.

Use write_bytes_to_xdr_buf for these cases rather than saving a pointer
to write to.  We plan to rewrite xdr_reserve_space to handle encoding
across page boundaries using a scratch buffer, and don't want to risk
writing to a pointer that was contained in a scratch buffer.

Also it will no longer be safe to calculate lengths by subtracting two
pointers, so use xdr_buf offsets instead.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:49 -04:00
J. Bruce Fields 1fcea5b20b nfsd4: use xdr_truncate_encode
Now that lengths are reliable, we can use xdr_truncate instead of
open-coding it everywhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:48 -04:00
J. Bruce Fields 6ac90391c6 nfsd4: keep xdr buf length updated
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-28 14:52:38 -04:00
J. Bruce Fields dd97fddedc nfsd4: no need for encode_compoundres to adjust lengths
xdr_reserve_space should now be calculating the length correctly as we
go, so there's no longer any need to fix it up here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-28 14:52:37 -04:00
J. Bruce Fields f46d382a74 nfsd4: remove ADJUST_ARGS
It's just uninteresting debugging code at this point.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-28 14:52:36 -04:00
J. Bruce Fields d3f627c815 nfsd4: use xdr_stream throughout compound encoding
Note this makes ADJUST_ARGS useless; we'll remove it in the following
patch.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-28 14:52:35 -04:00
J. Bruce Fields ddd1ea5636 nfsd4: use xdr_reserve_space in attribute encoding
This is a cosmetic change for now; no change in behavior.

Note we're just depending on xdr_reserve_space to do the bounds checking
for us, we're not really depending on its adjustment of iovec or xdr_buf
lengths yet, as those are fixed up by as necessary after the fact by
read-link operations and by nfs4svc_encode_compoundres.  However we do
have to update xdr->iov on read-like operations to prevent
xdr_reserve_space from messing with the already-fixed-up length of the
the head.

When the attribute encoding fails partway through we have to undo the
length adjustments made so far.  We do it manually for now, but later
patches will add an xdr_truncate_encode() helper to handle cases like
this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-28 14:52:34 -04:00
J. Bruce Fields 5f4ab94587 nfsd4: allow space for final error return
This post-encoding check should be taking into account the need to
encode at least an out-of-space error to the following op (if any).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-27 11:09:09 -04:00
J. Bruce Fields 07d1f80207 nfsd4: fix encoding of out-of-space replies
If nfsd4_check_resp_size() returns an error then we should really be
truncating the reply here, otherwise we may leave extra garbage at the
end of the rpc reply.

Also add a warning to catch any cases where our reply-size estimates may
be wrong in the case of a non-idempotent operation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-27 11:09:08 -04:00
J. Bruce Fields d518465866 nfsd4: tweak nfsd4_encode_getattr to take xdr_stream
Just change the nfsd4_encode_getattr api.  Not changing any code or
adding any new functionality yet.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:46 -04:00
J. Bruce Fields 4aea24b2ff nfsd4: embed xdr_stream in nfsd4_compoundres
This is a mechanical transformation with no change in behavior.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:45 -04:00
J. Bruce Fields e372ba60de nfsd4: decoding errors can still be cached and require space
Currently a non-idempotent op reply may be cached if it fails in the
proc code but not if it fails at xdr decoding.  I doubt there are any
xdr-decoding-time errors that would make this a problem in practice, so
this probably isn't a serious bug.

The space estimates should also take into account space required for
encoding of error returns.  Again, not a practical problem, though it
would become one after future patches which will tighten the space
estimates.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:44 -04:00
J. Bruce Fields fc208d026b Revert "nfsd4: fix nfs4err_resource in 4.1 case"
Since we're still limiting attributes to a page, the result here is that
a large getattr result will return NFS4ERR_REP_TOO_BIG/TOO_BIG_TO_CACHE
instead of NFS4ERR_RESOURCE.

Both error returns are wrong, and the real bug here is the arbitrary
limit on getattr results, fixed by as-yet out-of-tree patches.  But at a
minimum we can make life easier for clients by sticking to one broken
behavior in released kernels instead of two....

Trond says:

	one immediate consequence of this patch will be that NFSv4.1
	clients will now report EIO instead of EREMOTEIO if they hit the
	problem. That may make debugging a little less obvious.

	Another consequence will be that if we ever do try to add client
	side handling of NFS4ERR_REP_TOO_BIG, then we now have to deal
	with the “handle existing buggy server” syndrome.

Reported-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-04-18 14:46:45 +02:00
Yan, Zheng 18df11d0ea nfsd4: fix memory leak in nfsd4_encode_fattr()
fh_put() does not free the temporary file handle.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-31 17:08:23 -04:00
J. Bruce Fields 1bc49d83c3 nfsd4: fix nfs4err_resource in 4.1 case
encode_getattr, for example, can return nfserr_resource to indicate it
ran out of buffer space.  That's not a legal error in the 4.1 case.  And
in the 4.1 case, if we ran out of buffer space, we should have exceeded
a session limit too.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 21:24:56 -04:00
J. Bruce Fields 1bed92cb3c nfsd4: remove redundant check from nfsd4_check_resp_size
cstate->slot and ->session are each set together in nfsd4_sequence.  If
one is non-NULL, so is the other.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 21:24:54 -04:00
J. Bruce Fields 067e1ace46 nfsd4: update comments with obsolete function name
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 21:24:50 -04:00
Jeff Layton e874f9f8e0 svcrpc: explicitly reject compounds that are not padded out to 4-byte multiple
We have a WARN_ON in the nfsd4_decode_write() that tells us when the
client has sent a request that is not padded out properly according to
RFC4506. A WARN_ON really isn't appropriate in this case though since
this indicates a client bug, not a server one.

Move this check out to the top-level compound decoder and have it just
explicitly return an error. Also add a dprintk() that shows the client
address and xid to help track down clients and frames that trigger it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27 16:31:58 -04:00
J. Bruce Fields a11fcce154 nfsd4: fix test_stateid error reply encoding
If the entire operation fails then there's nothing to encode.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27 16:31:39 -04:00
J. Bruce Fields 798df33879 nfsd4: make set of large acl return efbig, not resource
If a client attempts to set an excessively large ACL, return
NFS4ERR_FBIG instead of NFS4ERR_RESOURCE.  I'm not sure FBIG is correct,
but I'm positive RESOURCE is wrong (it isn't even a well-defined error
any more for NFS versions since 4.1).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27 16:31:08 -04:00
J. Bruce Fields de3997a7ee nfsd4: buffer-length check for SUPPATTR_EXCLCREAT
This was an omission from 8c18f2052e
"nfsd41: SUPPATTR_EXCLCREAT attribute".

Cc: Benny Halevy <bhalevy@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27 16:30:42 -04:00
J. Bruce Fields d50e61361c nfsd4: decrease nfsd4_encode_fattr stack usage
A struct svc_fh is 320 bytes on x86_64, it'd be better not to have these
on the stack.

kmalloc'ing them probably isn't ideal either, but this is the simplest
thing to do.  If it turns out to be a problem in the readdir case then
we could add a svc_fh to nfsd4_readdir and pass that in.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-24 15:58:21 -05:00
J. Bruce Fields 3554116d3a nfsd4: simplify xdr encoding of nfsv4 names
We can simplify the idmapping code if it does its own encoding and
returns nfs errors.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-08 12:18:53 -05:00
J. Bruce Fields 87915c6472 nfsd4: encode_rdattr_error cleanup
There's a simpler way to write this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-07 16:01:18 -05:00
J. Bruce Fields 6b6d8137f1 nfsd4: nfsd4_encode_fattr cleanup
Remove some pointless goto's.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-07 16:01:17 -05:00
Kinglong Mee dfeecc829e nfsd: get rid of unused macro definition
Since defined in Linux-2.6.12-rc2, READTIME has not been used.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-03 13:44:24 -05:00
Kinglong Mee eba1c99ce4 nfsd: clean up unnecessary temporary variable in nfsd4_decode_fattr
host_err was only used for nfs4_acl_new.
This patch delete it, and return nfserr_jukebox directly.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-03 13:44:23 -05:00
Kinglong Mee 43212cc7df nfsd: using nfsd4_encode_noop for encoding destroy_session/free_stateid
Get rid of the extra code, using nfsd4_encode_noop for encoding destroy_session and free_stateid.
And, delete unused argument (fr_status) int nfsd4_free_stateid.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-03 13:44:22 -05:00
Kinglong Mee a9f7b4a06c nfsd: clean up an xdr reserved space calculation
We should use XDR_LEN to calculate reserved space in case the oid is not
a multiple of 4.

RESERVE_SPACE actually rounds up for us, but it's probably better to be
careful here.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-03 13:44:12 -05:00
Kinglong Mee a8bb84bc9e nfsd: calculate the missing length of bitmap in EXCHANGE_ID
commit 58cd57bfd9
"nfsd: Fix SP4_MACH_CRED negotiation in EXCHANGE_ID"
miss calculating the length of bitmap for spo_must_enforce and spo_must_allow.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-02 17:44:52 -05:00
Christoph Hellwig 2d8498dbf8 nfsd: start documenting some XDR handling functions
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-12-10 20:37:47 -05:00
J. Bruce Fields 365da4adeb nfsd4: fix xdr decoding of large non-write compounds
This fixes a regression from 247500820e
"nfsd4: fix decoding of compounds across page boundaries".  The previous
code was correct: argp->pagelist is initialized in
nfs4svc_deocde_compoundargs to rqstp->rq_arg.pages, and is therefore a
pointer to the page *after* the page we are currently decoding.

The reason that patch nevertheless fixed a problem with decoding
compounds containing write was a bug in the write decoding introduced by
5a80a54d21 "nfsd4: reorganize write
decoding", after which write decoding no longer adhered to the rule that
argp->pagelist point to the next page.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-11-19 18:06:54 -05:00
Christoph Hellwig aea240f416 nfsd: export proper maximum file size to the client
I noticed that we export a way to high value for the maxfilesize
attribute when debugging a client issue.  The issue didn't turn
out to be related to it, but I think we should export it, so that
clients can limit what write sizes they accept before hitting
the server.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-11-14 15:18:47 -05:00
J. Bruce Fields 6ff40decff nfsd4: improve write performance with better sendspace reservations
Currently the rpc code conservatively refuses to accept rpc's from a
client if the sum of its worst-case estimates of the replies it owes
that client exceed the send buffer space.

Unfortunately our estimate of the worst-case reply for an NFSv4 compound
is always the maximum read size.  This can unnecessarily limit the
number of operations we handle concurrently, for example in the case
most operations are writes (which have small replies).

We can do a little better if we check which ops the compound contains.

This is still a rough estimate, we'll need to improve on it some day.

Reported-by: Shyam Kaushik <shyamnfs1@gmail.com>
Tested-by: Shyam Kaushik <shyamnfs1@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-11-13 16:12:54 -05:00