OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Thomas Meyer	6089dd0d73	NFS: Fix bool initialization/comparison Bool initializations should use true and false. Bool tests don't need comparisons. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-11-17 16:43:43 -05:00
Elena Reshetova	2b28a7bee4	fs, nfs: convert pnfs_layout_hdr.plh_refcount from atomic_t to refcount_t atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable pnfs_layout_hdr.plh_refcount is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-11-17 13:47:59 -05:00
Elena Reshetova	eba6dd6917	fs, nfs: convert pnfs_layout_segment.pls_refcount from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-11-17 13:47:59 -05:00
Trond Myklebust	70d2f7b1ea	pNFS: Use the standard I/O stateid when calling LAYOUTGET Instead of having a private method for copying the open/delegation stateid, use the same call that is used for standard I/O through the MDS. Note that this means we transmit the stateid with a zero seqid, avoiding issues with NFS4ERR_OLD_STATEID. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-09-11 22:19:00 -04:00
Trond Myklebust	196639ebbe	NFS: Fix 2 use after free issues in the I/O code The writeback code wants to send a commit after processing the pages, which is why we want to delay releasing the struct path until after that's done. Also, the layout code expects that we do not free the inode before we've put the layout segments in pnfs_writehdr_free() and pnfs_readhdr_free() Fixes: `919e3bd9a8` ("NFS: Ensure we commit after writeback is complete") Fixes: `4714fb51fd` ("nfs: remove pgio_header refcount, related cleanup") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-09-08 22:07:52 -04:00
Trond Myklebust	8205b9ce03	NFSv4/pnfs: Replace pnfs_put_lseg_locked() with pnfs_put_lseg() Now that we no longer hold the inode->i_lock when manipulating the commit lists, it is safe to call pnfs_put_lseg() again. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-08-15 11:54:48 -04:00
Benjamin Coddington	08cb5b0f05	pnfs: Fix the check for requests in range of layout segment It's possible and acceptable for NFS to attempt to add requests beyond the range of the current pgio->pg_lseg, a case which should be caught and limited by the pg_test operation. However, the current handling of this case replaces pgio->pg_lseg with a new layout segment (after a WARN) within that pg_test operation. That will cause all the previously added requests to be submitted with this new layout segment, which may not be valid for those requests. Fix this problem by only returning zero for the number of bytes to coalesce from pg_test for this case which allows any previously added requests to complete on the current layout segment. The check for requests starting out of range of the layout segment moves to pg_init, so that the replacement of pgio->pg_lseg will be done when the next request is added. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-05-24 07:55:02 -04:00
Trond Myklebust	61f454e30c	pNFS: Fix a deadlock when coalescing writes and returning the layout Consider the following deadlock: Process P1 Process P2 Process P3 ========== ========== ========== lock_page(page) lseg = pnfs_update_layout(inode) lo = NFS_I(inode)->layout pnfs_error_mark_layout_for_return(lo) lock_page(page) lseg = pnfs_update_layout(inode) In this scenario, - P1 has declared the layout to be in error, but P2 holds a reference to a layout segment on that inode, so the layoutreturn is deferred. - P2 is waiting for a page lock held by P3. - P3 is asking for a new layout segment, but is blocked waiting for the layoutreturn. The fix is to ensure that pnfs_error_mark_layout_for_return() does not set the NFS_LAYOUT_RETURN flag, which blocks P3. Instead, we allow the latter to call LAYOUTGET so that it can make progress and unblock P2. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-05-02 12:35:33 -04:00
Trond Myklebust	5466d21411	pNFS: Don't clear the layout return info if there are segments to return In pnfs_clear_layoutreturn_info, ensure that we don't clear the layout return info if there are new segments queued for return due to, for instance, a race between a LAYOUTRETURN and a failed I/O attempt. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-05-02 12:35:33 -04:00
Trond Myklebust	1f18b82c34	pNFS: Ensure we commit the layout if it has been invalidated If the layout is being invalidated on the server, then we must invoke nfs_commit_inode() to ensure any commits to the DS get cleared out. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-04-29 11:29:30 -04:00
Trond Myklebust	37f8aa16da	pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure path If the attempt to write through pNFS fails, we need to use the same failure semantics as for the read path: If the FF_FLAGS_NO_IO_THRU_MDS flag is set or we have sufficient valid DSes, then we must retry through pNFS Fixes: `d67ae825a5` ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-04-29 00:02:37 -04:00
Trond Myklebust	bdebfccd0e	pNFS: Ensure we check layout validity before marking it for return pnfs_error_mark_layout_for_return needs to check that the layout is valid before calling pnfs_set_plh_return_info(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-04-28 13:07:01 -04:00
Trond Myklebust	6aeafd05ec	pNFS: Fix use after free issues in pnfs_do_read() The assumption should be that if the caller returns PNFS_ATTEMPTED, then hdr has been consumed, and so we should not be testing hdr->task.tk_status. If the caller returns PNFS_TRY_AGAIN, then we need to recoalesce and free hdr. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-04-25 15:42:34 -04:00
Trond Myklebust	b3230e80a6	pNFS: Ensure we check layout segment validity in the pg_init() callback If we have a layout segment cached in pgio->pg_lseg, we should check it for validity before reusing it in a new RPC request. Otherwise, if we recoalesce, we can end up looping forever. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-04-25 10:56:19 -04:00
Trond Myklebust	b94196888f	pNFS: Unexport pnfs_put_lseg_locked and _pnfs_return_layout They are not used outside the NFSv4 module. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-04-20 16:53:58 -04:00
Trond Myklebust	ee6625a948	pNFS: Fix a reference leak in _pnfs_return_layout IF NFS_LAYOUT_RETURN_REQUESTED is not set, then we currently exit without freeing the list of invalidated layout segments, leading to a reference leak. Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: `24408f5282` ("pNFS: Fix bugs in _pnfs_return_layout") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-01-26 15:50:41 -05:00
Trond Myklebust	e71708d4df	pNFS: Return RW layouts on OPEN_DOWNGRADE If the client holds no more writeable open state, and does not hold a write delegation, then send a layoutreturn as part of the OPEN_DOWNGRADE. We do this only for writes, since some layout drivers may require you to also hold a read layout if you are doing a R/W workload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-19 17:29:55 -05:00
Trond Myklebust	362fb578a5	pNFS: Release NFS_LAYOUT_RETURN when invalidating the layout stateid Ensure we release the NFS_LAYOUT_RETURN lock when we invalidate the layout stateid, so that processes and RPC tasks that are waiting on the layout return can continue. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-05 22:52:01 -05:00
Trond Myklebust	287bd3e954	pNFS: Add a layoutreturn callback to performa layout-private setup Add a callback to allow the flexfiles layout driver to initialise the layout private payload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-03 13:12:16 -05:00
Trond Myklebust	4d796d751c	pNFS: Allow layout drivers to manage private data in struct nfs4_layoutreturn Cleanup to allow layout drivers to attach private data to layoutreturn, and manage the data. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-02 23:37:45 -05:00
Trond Myklebust	b85f562049	pNFS: Skip invalid stateids when doing a bulk destroy If the layout stateid is already invalid, we have no work to do. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:51 -05:00
Trond Myklebust	29ade5db12	pNFS: Wait on outstanding layoutreturns to complete in pnfs_roc() Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:50 -05:00
Trond Myklebust	abb3e1c877	pNFS: Don't mark the layout as freed if the last lseg is marked for return Address another memory leak. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:50 -05:00
Trond Myklebust	4aab97327f	pNFS: Sync the layout state bits in pnfs_cache_lseg_for_layoutreturn Ensure that the layout state bits are synced when we cache a layout segment for layoutreturn using an appropriate call to pnfs_set_plh_return_info. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:49 -05:00
Trond Myklebust	24408f5282	pNFS: Fix bugs in _pnfs_return_layout We need to honour the NFS_LAYOUT_RETURN_REQUESTED bit regardless of whether or not there are layout segments pending. Furthermore, we should ensure that we leave the plh_return_segs list empty. This patch fixes a memory leak of the layout segments on plh_return_segs. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:49 -05:00
Trond Myklebust	fe1cf9469d	pNFS: Clear all layout segment state in pnfs_mark_layout_stateid_invalid When the layout state is invalidated, then so is the layout segment state, and hence we do need to clean up the state bits. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:48 -05:00
Trond Myklebust	1c5bd76d17	pNFS: Enable layoutreturn operation for return-on-close Amend the pnfs return on close helper functions to enable sending the layoutreturn op in CLOSE/DELEGRETURN. This closes a potential race between CLOSE/DELEGRETURN and parallel OPEN calls to the same file, and allows the client and the server to agree on whether or not there is an outstanding layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:47 -05:00
Trond Myklebust	828ed9ec1b	pNFS: Clean up - add a helper to initialise struct layoutreturn_args Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:47 -05:00
Trond Myklebust	69820d22c5	pNFS: Don't mark layout segments invalid on layoutreturn in pnfs_roc The layoutreturn call will take care of invalidating the layout segments once the call is successful. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:45 -05:00
Trond Myklebust	0cdc329ec9	pNFS: Skip checking for return-on-close if the layout is invalid Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:44 -05:00
Trond Myklebust	e685d237e6	pNFS: Remove spurious wake up in pnfs_layout_remove_lseg() There is no change to the value of NFS_LAYOUT_RETURN, so we should not be waking up the RPC call. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:43 -05:00
Trond Myklebust	2a974425e5	NFSv4: Ignore LAYOUTRETURN result if the layout doesn't match or is invalid Fix a potential race with CB_LAYOUTRECALL in which the server recalls the remaining layout segments while our LAYOUTRETURN is still in transit. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:43 -05:00
Trond Myklebust	68f744797e	pNFS: Do not free layout segments that are marked for return We may want to process and transmit layout stat information for the layout segments that are being returned, so we should defer freeing them until after the layoutreturn has completed. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:42 -05:00
Trond Myklebust	17822b207f	pNFS: consolidate the different range intersection tests Both pnfs.c and the flexfiles code have their own versions of the range intersection testing, and the "end_offset" helper. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:41 -05:00
Trond Myklebust	ee284e35d8	pNFS: Fix race in pnfs_wait_on_layoutreturn We must put the task to sleep while holding the inode->i_lock in order to ensure atomicity with the test for NFS_LAYOUT_RETURN. Fixes: `500d701f33` ("NFS41: make close wait for layoutreturn") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:41 -05:00
Trond Myklebust	6604b203fb	pNFS: On error, do not send LAYOUTGET until the LAYOUTRETURN has completed If there is an I/O error, we should not call LAYOUTGET until the LAYOUTRETURN that reports the error is complete. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.8+	2016-12-01 17:21:40 -05:00
Trond Myklebust	9888d837f3	pNFS: Force a retry of LAYOUTGET if the stateid doesn't match our cache If the server sends us a completely new stateid, and the client thinks it already holds a layout, then force a retry of the LAYOUTGET after invalidating the existing layout in order to avoid corruption due to races. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-12-01 17:21:40 -05:00
Trond Myklebust	ae5a459d5f	pNFS: Clear NFS_LAYOUT_RETURN_REQUESTED when invalidating the layout stateid We must ensure that we don't schedule a layoutreturn if the layout stateid has been marked as invalid. Fixes: `2a59a04116` ("pNFS: Fix pnfs_set_layout_stateid() to clear...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.8+	2016-12-01 17:21:39 -05:00
Trond Myklebust	7b650994ab	pNFS: Don't clear the layout stateid if a layout return is outstanding If we no longer hold any layout segments, we're normally expected to consider the layout stateid to be invalid. However we cannot assume this if we're about to, or in the process of sending a layoutreturn. Fixes: `334a8f3711` ("pNFS: Don't forget the layout stateid if...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.8+	2016-12-01 17:21:39 -05:00
Trond Myklebust	54e4a0dfa2	pNFS: Fix a deadlock between read resends and layoutreturn We must not call nfs_pageio_init_read() on a new nfs_pageio_descriptor while holding a reference to a layout segment, as that can deadlock pnfs_update_layout(). Fixes: `d67ae825a5` ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.0+	2016-12-01 17:21:38 -05:00
Anna Schumaker	192747166a	NFS: Don't print a pNFS error if we aren't using pNFS We used to check for a valid layout type id before verifying pNFS flags as an indicator for if we are using pNFS. This changed in `3132e49ece` with the introduction of multiple layout types, since now we are passing an array of ids instead of just one. Since then, users have been seeing a KERN_ERR printk show up whenever mounting NFS v4 without pNFS. This patch restores the original behavior of exiting set_pnfs_layoutdriver() early if we aren't using pNFS. Fixes `3132e49ece` ("pnfs: track multiple layout types in fsinfo structure") Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-11-07 16:11:30 -05:00
Trond Myklebust	bfc505ded0	pNFS: Fix atime updates on pNFS clients Fix the code so that we always mark the atime as invalid in nfs4_read_done(). Currently, the expectation appears to be that the pNFS drivers should always do this, with the result that most of them don't. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-27 14:35:36 -04:00
Jeff Layton	ca440c383a	pnfs: add a new mechanism to select a layout driver according to an ordered list Currently, the layout driver selection code always chooses the first one from the list. That's not really ideal however, as the server can send the list of layout types in any order that it likes. It's up to the client to select the best one for its needs. This patch adds an ordered list of preferred driver types and has the selection code sort the list of available layout drivers according to it. Any unrecognized layout type is sorted to the end of the list. For now, the order of preference is hardcoded, but it should be possible to make this configurable in the future. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:11:13 -04:00
Jeff Layton	3132e49ece	pnfs: track multiple layout types in fsinfo structure Current NFSv4.1/pNFS client assumes that MDS supports only one layout type. While it's true for most existing servers, nevertheless, this can be change in the near future. For now, this patch just plumbs in the ability to track a list of layouts in the fsinfo structure. The existing behavior of the client is preserved, by having it just select the first entry in the list. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Jeff Layton <jlayton@poochiereds.net> Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:35 -04:00
Trond Myklebust	334a8f3711	pNFS: Don't forget the layout stateid if there are outstanding LAYOUTGETs If there are outstanding LAYOUTGET rpc calls, then we want to ensure that we keep the layout stateid around so we that don't inadvertently pick up an old/misordered sequence id. The race is as follows: Client Server ====== ====== LAYOUTGET(seqid) LAYOUTGET(seqid) return LAYOUTGET(seqid+1) return LAYOUTGET(seqid+2) process LAYOUTGET(seqid+2) forget layout process LAYOUTGET(seqid+1) If it forgets the layout stateid before processing seqid+1, then the client will not check the layout->plh_barrier, and so will set the stateid with seqid+1. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-09-04 12:59:00 -04:00
Trond Myklebust	2a59a04116	pNFS: Fix pnfs_set_layout_stateid() to clear NFS_LAYOUT_INVALID_STID If the layout was marked as invalid, we want to ensure to initialise the layout header fields correctly. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-09-03 12:10:37 -04:00
Trond Myklebust	bf0291dd22	pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialised According to RFC5661, the client is responsible for serialising LAYOUTGET and LAYOUTRETURN to avoid ambiguity. Consider the case where we send both in parallel. Client Server ====== ====== LAYOUTGET(seqid=X) LAYOUTRETURN(seqid=X) LAYOUTGET return seqid=X+1 LAYOUTRETURN return seqid=X+2 Process LAYOUTRETURN Forget layout stateid Process LAYOUTGET Set seqid=X+1 The client processes the layoutget/layoutreturn in the wrong order, and since the result of the layoutreturn was to clear the only existing layout segment, the client forgets the layout stateid. When the LAYOUTGET comes in, it is treated as having a completely new stateid, and so the client sets the wrong sequence id... Fix is to check if there are outstanding LAYOUTGET requests before we send the LAYOUTRETURN (note that LAYOUGET will already wait if it sees an outstanding LAYOUTRETURN). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-09-03 12:10:37 -04:00
Trond Myklebust	b88fa69eaa	pNFS: The client must not do I/O to the DS if it's lease has expired Ensure that the client conforms to the normative behaviour described in RFC5661 Section 12.7.2: "If a client believes its lease has expired, it MUST NOT send I/O to the storage device until it has validated its lease." So ensure that we wait for the lease to be validated before using the layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v3.20+	2016-08-23 11:27:01 -04:00
Trond Myklebust	9a0fe86745	pNFS: Handle NFS4ERR_OLD_STATEID correctly in LAYOUTSTAT calls We normally want to update the stateid and then retry, Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-19 16:27:31 -04:00
Trond Myklebust	668f455dac	Merge branch 'pnfs'	2016-07-24 17:08:59 -04:00
Trond Myklebust	362745268c	Merge branch 'writeback'	2016-07-24 17:08:31 -04:00
Trond Myklebust	01d7b29f0e	pNFS: Remove redundant smp_mb() from pnfs_init_lseg() It's not visible yet, and won't be until after we grab the inode->i_lock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:43 -04:00
Trond Myklebust	119cef97a4	pNFS: Cleanup - do layout segment initialisation in one place ...instead of splitting the initialisation over init_lseg() and pnfs_layout_process(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:42 -04:00
Trond Myklebust	28c1acffea	pNFS: Remove redundant stateid invalidation The layout stateid will be invalidated once it holds no more layout segments anyway. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:42 -04:00
Trond Myklebust	f71dfe8fc9	pNFS: Remove redundant pnfs_mark_layout_returned_if_empty() That's already being taken care of in pnfs_layout_remove_lseg(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:41 -04:00
Trond Myklebust	d9b61708fe	pNFS: Clear the layout metadata if the server changed the layout stateid If the server changed the layout stateid's "other" field, then we should treat the old layout as being completely gone. In that case, we want to clear the metadata such as scheduled layoutreturns. Do this by calling pnfs_mark_layout_stateid_invalid(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:41 -04:00
Trond Myklebust	5f46be049b	pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid() Ensure nfs42_layoutstat_done() layoutget don't open code layout stateid invalidation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:40 -04:00
Trond Myklebust	e036f46453	NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id When determining which layout segments to return, we do want pnfs_mark_matching_lsegs_return to check that they match the layout sequence id. This ensures that we don't waste time if the server is replaying a layout recall that has already been satisfied. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:40 -04:00
Trond Myklebust	2d6cf5ab0b	pNFS: Do not set plh_return_seq for non-callback related layoutreturns In cases where we need to send a layoutreturn in order to propagate an error, we should not tie that to a specific layout stateid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:39 -04:00
Trond Myklebust	e5fd1904b8	pNFS: Ensure layoutreturn acts as a completion for layout callbacks When we return NFS_OK to the CB_LAYOUTRECALL, we are required to send a layoutreturn that "completes" that layout recall request, using the correct stateid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:39 -04:00
Trond Myklebust	ecebb80bf3	pNFS: Always update the layout barrier seqid on LAYOUTGET Currently, pnfs_set_layout_stateid() will update the layout sequence id barrier only if the stateid itself is newer than the current layout stateid. However in a situation where multiple LAYOUTGET calls and a LAYOUTRETURN raced, it is entirely possible for one of the LAYOUTGET to set the current stateid to something newer than the LAYOUTRETURN that needs to set the barrier. The fix is to allow the "update_barrier" flag to force a check as to whether or not the barrier needs to be updated. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:38 -04:00
Trond Myklebust	13bede18de	pNFS: Always update the layout stateid if NFS_LAYOUT_INVALID_STID is set If the layout stateid is invalid, then pnfs_set_layout_stateid() must always initialise it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:25 -04:00
Trond Myklebust	8e0acf9046	pNFS: Clear the layout return tracking on layout reinitialisation Ensure that we don't carry over layoutreturn info from a previous incarnation of this layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 12:51:49 -04:00
Trond Myklebust	66b53f3258	pNFS: Handle NFS4ERR_RECALLCONFLICT correctly in LAYOUTGET Instead of giving up altogether and falling back to doing I/O through the MDS, which may make the situation worse, wait for 2 lease periods for the callback to resolve itself, and then try destroying the existing layout. Only if this was an attempt at getting a first layout, do we give up altogether, as the server is clearly crazy. Fixes: `183d9e7b11` ("pnfs: rework LAYOUTGET retry handling") Cc: stable@vger.kernel.org # 4.7 Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>	2016-07-19 16:23:22 -04:00
Trond Myklebust	e85d7ee420	pNFS: Separate handling of NFS4ERR_LAYOUTTRYLATER and RECALLCONFLICT They are not the same error, and need to be handled differently. Fixes: `183d9e7b11` ("pnfs: rework LAYOUTGET retry handling") Cc: stable@vger.kernel.org # 4.7 Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>	2016-07-19 16:23:22 -04:00
Trond Myklebust	56b38a1f7c	pNFS: Fix post-layoutget error handling in pnfs_update_layout() The non-retry error path is currently broken and ends up releasing the reference to the layout twice. It also can end up clearing the NFS_LAYOUT_FIRST_LAYOUTGET flag twice, causing a race. In addition, the retry path will fail to decrement the plh_outstanding counter. Fixes: `183d9e7b11` ("pnfs: rework LAYOUTGET retry handling") Cc: stable@vger.kernel.org # 4.7 Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>	2016-07-19 16:22:47 -04:00
Trond Myklebust	2e18d4d822	pNFS: Files and flexfiles always need to commit before layoutcommit So ensure that we mark the layout for commit once the write is done, and then ensure that the commit to ds is finished before sending layoutcommit. Note that by doing this, we're able to optimise away the commit for the case of servers that don't need layoutcommit in order to return updated attributes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-05 19:08:01 -04:00
Trond Myklebust	2d148c7e84	NFSv4.1/pnfs: Mark the layout stateid invalid when all segments are removed According to RFC5661, section 12.5.3. the layout stateid is no longer valid once the client no longer holds any layout segments. Ensure that we mark it invalid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-06-24 12:01:00 -04:00
Trond Myklebust	e5241e4388	NFSv4.1/pnfs: Add sparse lock annotations for pnfs_find_alloc_layout Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-06-24 12:01:00 -04:00
Trond Myklebust	67a3b72146	NFSv4.1/pnfs: Layout stateids start out as being invalid Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-06-24 12:01:00 -04:00
Tom Haynes	c7d73af2d2	pnfs: pnfs_update_layout needs to consider if strict iomode checking is on As flexfiles has FF_FLAGS_NO_READ_IO, there is a need to generically support enforcing that a IOMODE_RW segment will not allow READ I/O. Signed-off-by: Tom Haynes <loghyr@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-05-26 08:40:56 -04:00
Jeff Layton	1b3c6d07e2	pnfs: make pnfs_layout_process more robust It can return NULL if layoutgets are blocked currently. Fix it to return -EAGAIN in that case, so we can properly handle it in pnfs_update_layout. Also, clean up and simplify the error handling -- eliminate "status" and just use "lseg". Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-05-17 15:48:13 -04:00
Jeff Layton	183d9e7b11	pnfs: rework LAYOUTGET retry handling There are several problems in the way a stateid is selected for a LAYOUTGET operation: We pick a stateid to use in the RPC prepare op, but that makes it difficult to serialize LAYOUTGETs that use the open stateid. That serialization is done in pnfs_update_layout, which occurs well before the rpc_prepare operation. Between those two events, the i_lock is dropped and reacquired. pnfs_update_layout can find that the list has lsegs in it and not do any serialization, but then later pnfs_choose_layoutget_stateid ends up choosing the open stateid. This patch changes the client to select the stateid to use in the LAYOUTGET earlier, when we're searching for a usable layout segment. This way we can do it all while holding the i_lock the first time, and ensure that we serialize any LAYOUTGET call that uses a non-layout stateid. This also means a rework of how LAYOUTGET replies are handled, as we must now get the latest stateid if we want to retransmit in response to a retryable error. Most of those errors boil down to the fact that the layout state has changed in some fashion. Thus, what we really want to do is to re-search for a layout when it fails with a retryable error, so that we can avoid reissuing the RPC at all if possible. While the LAYOUTGET RPC is async, the initiating thread always waits for it to complete, so it's effectively synchronous anyway. Currently, when we need to retry a LAYOUTGET because of an error, we drive that retry via the rpc state machine. This means that once the call has been submitted, it runs until it completes. So, we must move the error handling for this RPC out of the rpc_call_done operation and into the caller. In order to handle errors like NFS4ERR_DELAY properly, we must also pass a pointer to the sliding timeout, which is now moved to the stack in pnfs_update_layout. The complicating errors are -NFS4ERR_RECALLCONFLICT and -NFS4ERR_LAYOUTTRYLATER, as those involve a timeout after which we give up and return NULL back to the caller. So, there is some special handling for those errors to ensure that the layers driving the retries can handle that appropriately. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-05-17 15:48:12 -04:00
Jeff Layton	83026d80a1	pnfs: lift retry logic from send_layoutget to pnfs_update_layout If we get back something like NFS4ERR_OLD_STATEID, that will be translated into -EAGAIN, and the do/while loop in send_layoutget will drive the call again. This is not quite what we want, I think. An error like that is a sign that something has changed. That something could have been a concurrent LAYOUTGET that would give us a usable lseg. Lift the retry logic into pnfs_update_layout instead. That allows us to redo the layout search, and may spare us from having to issue an RPC. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-05-17 15:48:12 -04:00
Jeff Layton	d03ab29dbb	pnfs: fix bad error handling in send_layoutget Currently, the code will clear the fail bit if we get back a fatal error. I don't think that's correct -- we want to clear that bit if we do not get a fatal error. Fixes: `0bcbf039f6` (nfs: handle request add failure properly) Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-05-17 15:48:11 -04:00
Jeff Layton	6d597e1750	pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args LAYOUTRETURN is "special" in that servers and clients are expected to work with old stateids. When the client sends a LAYOUTRETURN with an old stateid in it then the server is expected to only tear down layout segments that were present when that seqid was current. Ensure that the client handles its accounting accordingly. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-05-17 15:48:10 -04:00
Jeff Layton	3982a6a2d0	pnfs: keep track of the return sequence number in pnfs_layout_hdr When we want to selectively do a LAYOUTRETURN, we need to specify a stateid that represents most recent layout acquisition that is to be returned. When we mark a layout stateid to be returned, we update the return sequence number in the layout header with that value, if it's newer than the existing one. Then, when we go to do a LAYOUTRETURN on layout header put, we overwrite the seqid in the stateid with the saved one, and then zero it out. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-05-17 15:48:10 -04:00
Jeff Layton	6675528380	pnfs: record sequence in pnfs_layout_segment when it's created In later patches, we're going to teach the client to be more selective about how it returns layouts. This means keeping a record of what the stateid's seqid was at the time that the server handed out a layout segment. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-05-17 15:48:09 -04:00
Trond Myklebust	f538d0ba5b	pNFS: Fix a leaked layoutstats flag Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-05-17 15:48:05 -04:00
Weston Andros Adamson	1b1bc66bb4	pnfs: set NFS_IOHDR_REDO in pnfs_read_resend_pnfs Like other resend paths, mark the (old) hdr as NFS_IOHDR_REDO. This ensures the hdr completion function will not count the (old) hdr as good bytes. Also, vector the error back through the hdr->task.tk_status like other retry calls. This fixes a bug with the FlexFiles layout where libaio was reporting more bytes read than requested. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-05-09 09:05:40 -04:00
Kirill A. Shutemov	09cbfeaf1a	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-04-04 10:41:08 -07:00
Trond Myklebust	9fd4b9fc76	NFSv4.x/pnfs: Fix a race between layoutget and bulk recalls Replace another case where the layout 'plh_block_lgets' can trigger infinite loops in send_layoutget(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-02-22 17:46:34 -05:00
Trond Myklebust	2454dfea0a	NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layout If the server reboots while there is a layoutget outstanding, then the call to pnfs_choose_layoutget_stateid() will fail with an EAGAIN error, which causes an infinite loop in send_layoutget(). The reason why we never break out of the loop is that the layout 'plh_block_lgets' field is never cleared. Fix is to replace plh_block_lgets with NFS_LAYOUT_INVALID_STID, which can be reset after a new layoutget. Fixes: `ab7d763e47` ("pNFS: Ensure nfs4_layoutget_prepare returns...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-02-22 17:34:59 -05:00
Trond Myklebust	e0fa0d0189	pNFS: Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode When setting the layout return mode, we must always also set the NFS_LAYOUT_RETURN_REQUESTED flag to ensure that we send a layoutreturn. Otherwise pnfs_error_mark_layout_for_return() could set the mode, but fail to send the layoutreturn because another is already in flight. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-02-15 13:03:30 -05:00
Trond Myklebust	2f21596882	pNFS: Fix pnfs_mark_matching_lsegs_return() We don't need to schedule a layoutreturn if the layout segment can be freed immediately. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-02-15 12:56:01 -05:00
Trond Myklebust	2370abdab5	NFS: Cleanup - rename NFS_LAYOUT_RETURN_BEFORE_CLOSE NFS_LAYOUT_RETURN_BEFORE_CLOSE is being used to signal that a layoutreturn is needed, either due to a layout recall or to a layout error. Rename it to NFS_LAYOUT_RETURN_REQUESTED in order to clarify its purpose. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-01-27 20:40:05 -05:00
Trond Myklebust	13c13a6ad7	pNFS: Fix missing layoutreturn calls The layoutreturn code currently relies on pnfs_put_lseg() to initiate the RPC call when conditions are right. A problem arises when we want to free the layout segment from inside an inode->i_lock section (e.g. in pnfs_clear_request_commit()), since we cannot sleep. The workaround is to move the actual call to pnfs_send_layoutreturn() to pnfs_put_layout_hdr(), which doesn't have this restriction. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-01-26 23:12:11 -05:00
Trond Myklebust	942e3d72a6	Merge branch 'pnfs_generic' * pnfs_generic: NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid() NFSv4.1/pNFS: Fix a race in initiate_file_draining() NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn() NFS: Relax requirements in nfs_flush_incompatible NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid NFS: Allow multiple commit requests in flight per file NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs NFSv4: List stateid information in the callback tracepoints NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1 pNFS: If we have to delay the layout callback, mark the layout for return NFSv4.1/pNFS: Add a helper to mark the layout as returned pNFS: Ensure nfs4_layoutget_prepare returns the correct error	2016-01-04 13:19:55 -05:00
Trond Myklebust	506c0d6826	NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-01-04 13:07:15 -05:00
Trond Myklebust	e144e5391c	NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-01-04 12:52:53 -05:00
Trond Myklebust	71b39854a5	NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid() Make it more obvious what we're returning... Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-01-04 12:41:15 -05:00
Trond Myklebust	10335556c9	NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout Fix a bug whereby if all the layout segments could be immediately freed, the call to pnfs_error_mark_layout_for_return() would never result in a layoutreturn. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-01-04 12:36:11 -05:00
Trond Myklebust	5c97f5de2c	NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode If pnfs_mark_matching_lsegs_return() needs to mark a layout segment for return, then it must also set the return iomode. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-01-04 12:36:11 -05:00
Trond Myklebust	50f563ef5d	NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-01-04 12:36:10 -05:00
Trond Myklebust	ed429d6b93	NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn() A stateid is a structure, pass it as a pointer. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-01-04 12:35:47 -05:00
Trond Myklebust	b20135d0b2	NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid If the layout segment is invalid, then we should not be adding more write requests to the commit list. Instead, those writes should be replayed after requesting a new layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-12-31 15:55:35 -05:00
Trond Myklebust	fc7ff36747	pNFS: If we have to delay the layout callback, mark the layout for return If the client needs to delay the layout callback, then speed up the recall process by marking the remaining layout segments to be actively returned by the client. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-12-28 14:33:04 -05:00
Trond Myklebust	0654cc726f	NFSv4.1/pNFS: Add a helper to mark the layout as returned This ensures that we don't reuse the stateid if a layout return or implied layout return means that we've returned all layout segments Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-12-28 14:33:04 -05:00
Trond Myklebust	b9fc773ef5	pNFS/flexfiles: Don't mark the entire layout as failed, when returning it In pNFS/flexfiles, we want to return the layout without necessarily marking it as having completely failed. We therefore move the call to pnfs_layout_io_set_failed() out of pnfs_error_mark_layout_for_return(), and then ensura that pNFS/files layout calls it separately. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-12-28 14:32:40 -05:00
Trond Myklebust	2e5b29f044	pNFS/flexfiles: Don't prevent flexfiles client from retrying LAYOUTGET Fix a bug in which flexfiles clients are falling back to I/O through the MDS even when the FF_FLAGS_NO_IO_THRU_MDS flag is set. The flexfiles client will always report errors through the LAYOUTRETURN and/or LAYOUTERROR mechanisms, so it should normally be safe for it to retry the LAYOUTGET until it fails or succeeds. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-12-28 14:32:40 -05:00

1 2 3 4 5 ...

414 Commits