userfaultfd: non-cooperative: closing the uffd without triggering SIGBUS

This is an enhancement to avoid a non cooperative userfaultfd manager
having to unregister all regions before it can close the uffd after all
userfaultfd activity completed.

The UFFDIO_UNREGISTER would serialize against the handle_userfault by
taking the mmap_sem for writing, but we can simply repeat the page fault
if we detect the uffd was closed and so the regular page fault paths
should takeover.

Link: http://lkml.kernel.org/r/20170823181227.19926-1-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
Andrea Arcangeli 2017-09-08 16:12:42 -07:00 committed by Linus Torvalds
parent 98c70baad4
commit 656710a60e
1 changed files with 19 additions and 1 deletions

View File

@ -381,8 +381,26 @@ int handle_userfault(struct vm_fault *vmf, unsigned long reason)
* in __get_user_pages if userfaultfd_release waits on the
* caller of handle_userfault to release the mmap_sem.
*/
if (unlikely(ACCESS_ONCE(ctx->released)))
if (unlikely(ACCESS_ONCE(ctx->released))) {
/*
* Don't return VM_FAULT_SIGBUS in this case, so a non
* cooperative manager can close the uffd after the
* last UFFDIO_COPY, without risking to trigger an
* involuntary SIGBUS if the process was starting the
* userfaultfd while the userfaultfd was still armed
* (but after the last UFFDIO_COPY). If the uffd
* wasn't already closed when the userfault reached
* this point, that would normally be solved by
* userfaultfd_must_wait returning 'false'.
*
* If we were to return VM_FAULT_SIGBUS here, the non
* cooperative manager would be instead forced to
* always call UFFDIO_UNREGISTER before it can safely
* close the uffd.
*/
ret = VM_FAULT_NOPAGE;
goto out;
}
/*
* Check that we can return VM_FAULT_RETRY.