Use vn_io_fault_uiomove on FreeBSD to avoid potential deadlock

Added to prevent a possible deadlock, the following comments from
FreeBSD explain the issue.  The comment describing vn_io_fault_uiomove:

/*
 * Helper function to perform the requested uiomove operation using
 * the held pages for io->uio_iov[0].iov_base buffer instead of
 * copyin/copyout.  Access to the pages with uiomove_fromphys()
 * instead of iov_base prevents page faults that could occur due to
 * pmap_collect() invalidating the mapping created by
 * vm_fault_quick_hold_pages(), or pageout daemon, page laundry or
 * object cleanup revoking the write access from page mappings.
 *
 * Filesystems specified MNTK_NO_IOPF shall use vn_io_fault_uiomove()
 * instead of plain uiomove().
 */

This used for vn_io_fault which has the following motivation:

/*
 * The vn_io_fault() is a wrapper around vn_read() and vn_write() to
 * prevent the following deadlock:
 *
 * Assume that the thread A reads from the vnode vp1 into userspace
 * buffer buf1 backed by the pages of vnode vp2.  If a page in buf1 is
 * currently not resident, then system ends up with the call chain
 *   vn_read() -> VOP_READ(vp1) -> uiomove() -> [Page Fault] ->
 *     vm_fault(buf1) -> vnode_pager_getpages(vp2) -> VOP_GETPAGES(vp2)
 * which establishes lock order vp1->vn_lock, then vp2->vn_lock.
 * If, at the same time, thread B reads from vnode vp2 into buffer buf2
 * backed by the pages of vnode vp1, and some page in buf2 is not
 * resident, we get a reversed order vp2->vn_lock, then vp1->vn_lock.
 *
 * To prevent the lock order reversal and deadlock, vn_io_fault() does
 * not allow page faults to happen during VOP_READ() or VOP_WRITE().
 * Instead, it first tries to do the whole range i/o with pagefaults
 * disabled. If all pages in the i/o buffer are resident and mapped,
 * VOP will succeed (ignoring the genuine filesystem errors).
 * Otherwise, we get back EFAULT, and vn_io_fault() falls back to do
 * i/o in chunks, with all pages in the chunk prefaulted and held
 * using vm_fault_quick_hold_pages().
 *
 * Filesystems using this deadlock avoidance scheme should use the
 * array of the held pages from uio, saved in the curthread->td_ma,
 * instead of doing uiomove().  A helper function
 * vn_io_fault_uiomove() converts uiomove request into
 * uiomove_fromphys() over td_ma array.
 *
 * Since vnode locks do not cover the whole i/o anymore, rangelocks
 * make the current i/o request atomic with respect to other i/os and
 * truncations.
 */

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10177

This commit is contained in:

Matthew Macy

2020-04-08 10:30:27 -07:00

committed by

GitHub

parent 7e3df9db12

commit 01c4f2bf29

No known key found for this signature in database

GPG Key ID: 4AEE18F83AFDEB23

1 changed files with 10 additions and 1 deletions

									
										11

module/zfs/dmu.c

										View File
									
				@ -1359,8 +1359,13 @@ dmu_read_uio_dnode(dnode_t *dn, uio_t *uio, uint64_t size)

								XUIOSTAT_BUMP(xuiostat_rbuf_copied);

						} else

				#endif

				#ifdef __FreeBSD__

							err = vn_io_fault_uiomove((char *)db->db_data + bufoff,

							    tocpy, uio);

				#else

							err = uiomove((char *)db->db_data + bufoff, tocpy,

							    UIO_READ, uio);

				#endif

						if (err)

							break;

				@ -1459,9 +1464,13 @@ dmu_write_uio_dnode(dnode_t *dn, uio_t *uio, uint64_t size, dmu_tx_t *tx)

						 * to lock the pages in memory, so that uiomove won't

						 * block.

						 */

				#ifdef __FreeBSD__

						err = vn_io_fault_uiomove((char *)db->db_data + bufoff,

						    tocpy, uio);

				#else

						err = uiomove((char *)db->db_data + bufoff, tocpy,

						    UIO_WRITE, uio);

				#endif

						if (tocpy == db->db_size)

							dmu_buf_fill_done(db, tx);

Use vn_io_fault_uiomove on FreeBSD to avoid potential deadlock

11 module/zfs/dmu.c Unescape Escape View File

11

module/zfs/dmu.c

View File