linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Jeff Layton	1983a66f57	nfsd: don't set a FL_LAYOUT lease for flexfiles layouts We currently can hit a deadlock (of sorts) when trying to use flexfiles layouts with XFS. XFS will call break_layout when something wants to write to the file. In the case of the (super-simple) flexfiles layout driver in knfsd, the MDS and DS are the same machine. The client can get a layout and then issue a v3 write to do its I/O. XFS will then call xfs_break_layouts, which will cause a CB_LAYOUTRECALL to be issued to the client. The client however can't return the layout until the v3 WRITE completes, but XFS won't allow the write to proceed until the layout is returned. Christoph says: XFS only cares about block-like layouts where the client has direct access to the file blocks. I'd need to look how to propagate the flag into break_layout, but in principle we don't need to do any recalls on truncate ever for file and flexfile layouts. If we're never going to recall the layout, then we don't even need to set the lease at all. Just skip doing so on flexfiles layouts by adding a new flag to struct nfsd4_layout_ops and skipping the lease setting and removal when that flag is true. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-09-16 16:15:52 -04:00
Jeff Layton	8a4c392688	nfsd: allow nfsd to advertise multiple layout types If the underlying filesystem supports multiple layout types, then there is little reason not to advertise that fact to clients and let them choose what type to use. Turn the ex_layout_type field into a bitfield. For each supported layout type, we set a bit in that field. When the client requests a layout, ensure that the bit for that layout type is set. When the client requests attributes, send back a list of supported types. Signed-off-by: Jeff Layton <jlayton@poochiereds.net> Reviewed-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-07-15 15:31:32 -04:00
Tom Haynes	9b9960a0ca	nfsd: Add a super simple flex file server Have a simple flex file server where the mds (NFSv4.1 or NFSv4.2) is also the ds (NFSv3). I.e., the metadata and the data file are the exact same file. This will allow testing of the flex file client. Simply add the "pnfs" export option to your export in /etc/exports and mount from a client that supports flex files. Signed-off-by: Tom Haynes <loghyr@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-07-13 15:40:48 -04:00
Jeff Layton	14b7f4a1ed	nfsd: handle seqid wraparound in nfsd4_preprocess_layout_stateid Move the existing static function to an inline helper, and call it. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-05-13 15:34:47 -04:00
Christoph Hellwig	f99d4fbdae	nfsd: add SCSI layout support This is a simple extension to the block layout driver to use SCSI persistent reservations for access control and fencing, as well as SCSI VPD pages for device identification. For this we need to pass the nfs4_client to the proc_getdeviceinfo method to generate the reservation key, and add a new fence_client method to allow for fence actions in the layout driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-18 11:42:53 -04:00
Christoph Hellwig	81c3932901	nfsd: add a new config option for the block layout driver Split the config symbols into a generic pNFS one, which is invisible and gets selected by the layout drivers, and one for the block layout driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-18 11:40:57 -04:00
Linus Torvalds	cc80fe0eef	Smaller bugfixes and cleanup, including a fix for a failures of kerberized NFSv4.1 mounts, and Scott Mayhew's work addressing ACK storms that can affect some high-availability NFS setups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWmVZHAAoJECebzXlCjuG+Cn4P/3zwSuwIeLuv9b89vzFXU8Xv AbBWHk7WkFXJQGTKdclYjwxqU+l15D5lYHCae1cuD5eXdviraxXf7EcnqrMhJUc0 oRiQx0rAwlEkKUAVrxGCFP7WKjlX3TsEBV6wPpTCP3BEMzTPDEeaDek7+hICFkLF 9a/miEXAopm3jxP7WNmXEkdKpFEHklDDwtv6Av7iIKCW6+7XCGp7Prqo4NQKAKp6 hjE+nvt2HiD06MZhUeyb14cn6547smzt1rbSfK4IB4yHMwLyaoqPrT7ekDh9LDrE uGgo+Y2PBbEcTAE6tJ88EjZx7cMCFPn0te+eKPgnpPy9RqrNqSxj5N/b7JAecKgW a/09BtvFOoYs8fO5ovqeRY5THrE3IRyMIwn4gt7fCYaaAbG3dwGKG1uklTAVXtb1 95DkhOb8He2VhOCCoJ6ybbTnRfjB6b/cv7ZuEGlQfvTE+BtU3Jj9I76ruWFhb3zd HM1dRI20UfwL/0Y8yYhZ+/rje9SSk2jOmVgSCqY9hnCmEqOqOdUU0X/uumIWaBym zfGx9GIM0jQuYVdLQRXtJJbUgJUUN3MilGyU5wx7YoXip5guqTalXqAdQpShzXeW s1ATYh/mY5X9ig51KogkkVlm9bXDQAzJBAnDRpLtJZqy5Cgkrj9RSu0ExN1Rmlhw LKQCddBQxUSWJ+XWycgK =G7V3 -----END PGP SIGNATURE----- Merge tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "Smaller bugfixes and cleanup, including a fix for a failures of kerberized NFSv4.1 mounts, and Scott Mayhew's work addressing ACK storms that can affect some high-availability NFS setups" * tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linux: nfsd: add new io class tracepoint nfsd: give up on CB_LAYOUTRECALLs after two lease periods nfsd: Fix nfsd leaks sunrpc module references lockd: constify nlmsvc_binding structure lockd: use to_delayed_work nfsd: use to_delayed_work Revert "svcrdma: Do not send XDR roundup bytes for a write chunk" lockd: Register callbacks on the inetaddr_chain and inet6addr_chain nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain sunrpc: Add a function to close temporary transports immediately nfsd: don't base cl_cb_status on stale information nfsd4: fix gss-proxy 4.1 mounts for some AD principals nfsd: fix unlikely NULL deref in mach_creds_match nfsd: minor consolidation of mach_cred handling code nfsd: helper for dup of possibly NULL string svcrpc: move some initialization to common code nfsd: fix a warning message nfsd: constify nfsd4_callback_ops structure nfsd: recover: constify nfsd4_client_tracking_ops structures svcrdma: Do not send XDR roundup bytes for a write chunk	2016-01-15 12:49:44 -08:00
Jeff Layton	6b9b21073d	nfsd: give up on CB_LAYOUTRECALLs after two lease periods Have the CB_LAYOUTRECALL code treat NFS4_OK and NFS4ERR_DELAY returns equivalently. Change the code to periodically resend CB_LAYOUTRECALLS until the ls_layouts list is empty or the client returns a different error code. If we go for two lease periods without the list being emptied or the client sending a hard error, then we give up and clean out the list anyway. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-01-08 16:47:51 -05:00
Jeff Layton	be20aa00c6	nfsd: don't hold ls_mutex across a layout recall We do need to serialize layout stateid morphing operations, but we currently hold the ls_mutex across a layout recall which is pretty ugly. It's also unnecessary -- once we've bumped the seqid and copied it, we don't need to serialize the rest of the CB_LAYOUTRECALL vs. anything else. Just drop the mutex once the copy is done. This was causing a "workqueue leaked lock or atomic" warning and an occasional deadlock. There's more work to be done here but this fixes the immediate regression. Fixes: `cc8a55320b` "nfsd: serialize layout stateid morphing operations" Cc: stable@vger.kernel.org Reported-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-12-16 11:49:58 -05:00
Julia Lawall	c4cb897462	nfsd: constify nfsd4_callback_ops structure The nfsd4_callback_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-11-23 12:15:31 -07:00
Jeff Layton	9767feb2c6	nfsd: ensure that seqid morphing operations are atomic wrt to copies Bruce points out that the increment of the seqid in stateids is not serialized in any way, so it's possible for racing calls to bump it twice and end up sending the same stateid. While we don't have any reports of this problem it _is_ theoretically possible, and could lead to spurious state recovery by the client. In the current code, update_stateid is always followed by a memcpy of that stateid, so we can combine the two operations. For better atomicity, we add a spinlock to the nfs4_stid and hold that when bumping the seqid and copying the stateid. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-10-23 15:57:33 -04:00
Jeff Layton	cc8a55320b	nfsd: serialize layout stateid morphing operations In order to allow the client to make a sane determination of what happened with racing LAYOUTGET/LAYOUTRETURN/CB_LAYOUTRECALL calls, we must ensure that the seqids return accurately represent the order of operations. The simplest way to do that is to ensure that operations on a single stateid are serialized. This patch adds a mutex to the layout stateid, and locks it when checking the layout stateid's seqid. The mutex is held over the entire operation and released after the seqid is bumped. Note that in the case of CB_LAYOUTRECALL we must move the increment of the seqid and setting into a new cb "prepare" operation. The lease infrastructure will call the lm_break callback with a spinlock held, so and we can't take the mutex in that codepath. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-10-23 15:57:32 -04:00
Kinglong Mee	1ca4b88e7d	nfsd: Fix a file leak on nfsd4_layout_setlease failure If nfsd4_layout_setlease fails, nfsd will not put ls->ls_file. Fix commit `c5c707f96f` "nfsd: implement pNFS layout recalls". Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-07-20 14:58:22 -04:00
Christoph Hellwig	f3f03330de	nfsd: require an explicit option to enable pNFS Turns out sending out layouts to any client is a bad idea if they can't get at the storage device, so require explicit admin action to enable pNFS. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-03-30 16:05:26 -04:00
Kinglong Mee	7890203da2	NFSD: Fix bad update of layout in nfsd4_return_file_layout With return layout as, (seg is return layout, lo is record layout) seg->offset <= lo->offset and layout_end(seg) < layout_end(lo), nfsd should update lo's offset to seg's end, and, seg->offset > lo->offset and layout_end(seg) >= layout_end(lo), nfsd should update lo's end to seg's offset. Fixes: `9cf514ccfa` ("nfsd: implement pNFS operations") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-03-25 21:13:03 -04:00
Kinglong Mee	6f8f28ec5f	NFSD: Check layout type when returning client layouts According to RFC5661: " When lr_returntype is LAYOUTRETURN4_FSID, the current filehandle is used to identify the file system and all layouts matching the client ID, the fsid of the file system, lora_layout_type, and lora_iomode are returned. When lr_returntype is LAYOUTRETURN4_ALL, all layouts matching the client ID, lora_layout_type, and lora_iomode are returned and the current filehandle is not used. " When returning client layouts, always check layout type. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-03-20 12:43:59 -04:00
Kinglong Mee	715a03d284	NFSD: restore trace event lost in mismerge `31ef83dc05` "nfsd: add trace events" had a typo that dropped a trace event and replaced it by an incorrect recursive call to nfsd4_cb_layout_fail. `133d558216` "Subject: nfsd: don't recursively call nfsd4_cb_layout_fail" fixed the crash, this restores the tracepoint. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-03-20 12:43:06 -04:00
Christoph Hellwig	133d558216	Subject: nfsd: don't recursively call nfsd4_cb_layout_fail Due to a merge error when creating `c5c707f9` ("nfsd: implement pNFS layout recalls"), we recursively call nfsd4_cb_layout_fail from itself, leading to stack overflows. Signed-off-by: Christoph Hellwig <hch@lst.de> Fixes: `c5c707f9` ("nfsd: implement pNFS layout recalls") Signed-off-by: J. Bruce Fields <bfields@redhat.com> --- fs/nfsd/nfs4layouts.c \| 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c index 3c1bfa1..1028a06 100644 --- a/fs/nfsd/nfs4layouts.c +++ b/fs/nfsd/nfs4layouts.c @@ -587,8 +587,6 @@ nfsd4_cb_layout_fail(struct nfs4_layout_stateid ls) rpc_ntop((struct sockaddr )&clp->cl_addr, addr_str, sizeof(addr_str)); - nfsd4_cb_layout_fail(ls); - printk(KERN_WARNING "nfsd: client %s failed to respond to layout recall. " " Fencing..\n", addr_str); -- 1.9.1	2015-03-19 15:49:27 -04:00
Christoph Hellwig	8650b8a058	nfsd: pNFS block layout driver Add a small shim between core nfsd and filesystems to translate the somewhat cumbersome pNFS data structures and semantics to something more palatable for Linux filesystems. Thanks to Rick McNeal for the old prototype pNFS blocklayout server code, which gave a lot of inspiration to this version even if no code is left from it. Signed-off-by: Christoph Hellwig <hch@lst.de>	2015-02-05 14:35:18 +01:00
Christoph Hellwig	31ef83dc05	nfsd: add trace events For now just a few simple events to trace the layout stateid lifetime, but these already were enough to find several bugs in the Linux client layout stateid handling. Signed-off-by: Christoph Hellwig <hch@lst.de>	2015-02-02 18:09:44 +01:00
Christoph Hellwig	c5c707f96f	nfsd: implement pNFS layout recalls Add support to issue layout recalls to clients. For now we only support full-file recalls to get a simple and stable implementation. This allows to embedd a nfsd4_callback structure in the layout_state and thus avoid any memory allocations under spinlocks during a recall. For normal use cases that do not intent to share a single file between multiple clients this implementation is fully sufficient. To ensure layouts are recalled on local filesystem access each layout state registers a new FL_LAYOUT lease with the kernel file locking code, which filesystems that support pNFS exports that require recalls need to break on conflicting access patterns. The XDR code is based on the old pNFS server implementation by Andy Adamson, Benny Halevy, Boaz Harrosh, Dean Hildebrand, Fred Isaman, Marc Eshel, Mike Sager and Ricardo Labiaga. Signed-off-by: Christoph Hellwig <hch@lst.de>	2015-02-02 18:09:43 +01:00
Christoph Hellwig	9cf514ccfa	nfsd: implement pNFS operations Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage outstanding layouts and devices. Layout management is very straight forward, with a nfs4_layout_stateid structure that extends nfs4_stid to manage layout stateids as the top-level structure. It is linked into the nfs4_file and nfs4_client structures like the other stateids, and contains a linked list of layouts that hang of the stateid. The actual layout operations are implemented in layout drivers that are not part of this commit, but will be added later. The worst part of this commit is the management of the pNFS device IDs, which suffers from a specification that is not sanely implementable due to the fact that the device-IDs are global and not bound to an export, and have a small enough size so that we can't store the fsid portion of a file handle, and must never be reused. As we still do need perform all export authentication and validation checks on a device ID passed to GETDEVICEINFO we are caught between a rock and a hard place. To work around this issue we add a new hash that maps from a 64-bit integer to a fsid so that we can look up the export to authenticate against it, a 32-bit integer as a generation that we can bump when changing the device, and a currently unused 32-bit integer that could be used in the future to handle more than a single device per export. Entries in this hash table are never deleted as we can't reuse the ids anyway, and would have a severe lifetime problem anyway as Linux export structures are temporary structures that can go away under load. Parts of the XDR data, structures and marshaling/unmarshaling code, as well as many concepts are derived from the old pNFS server implementation from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman, Mike Sager, Ricardo Labiaga and many others. Signed-off-by: Christoph Hellwig <hch@lst.de>	2015-02-02 18:09:42 +01:00

22 Commits