read_user_stack_slow is called with interrupts soft disabled and it copies contents
from the page which we find mapped to a specific address. To convert
userspace address to pfn, the kernel now uses lockless page table walk.
The kernel needs to make sure the pfn value read remains stable and is not released
and reused for another process while the contents are read from the page. This
can only be achieved by holding a page reference.
One of the first approaches I tried was to check the pte value after the kernel
copies the contents from the page. But as shown below we can still get it wrong
CPU0 CPU1
pte = READ_ONCE(*ptep);
pte_clear(pte);
put_page(page);
page = alloc_page();
memcpy(page_address(page), "secret password", nr);
memcpy(buf, kaddr + offset, nb);
put_page(page);
handle_mm_fault()
page = alloc_page();
set_pte(pte, page);
if (pte_val(pte) != pte_val(*ptep))
Hence switch to __get_user_pages_fast.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-8-aneesh.kumar@linux.ibm.com