docs: kvm: Convert mmu.txt to ReST format

- Use document title and chapter markups;
- Add markups for tables;
- Add markups for literal blocks;
- Add blank lines and adjust indentation.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This commit is contained in:
Mauro Carvalho Chehab 2020-02-10 07:03:00 +01:00 committed by Paolo Bonzini
parent 75e7fcdb4a
commit 037d1f92ef
2 changed files with 49 additions and 14 deletions

View File

@ -13,6 +13,7 @@ KVM
halt-polling halt-polling
hypercalls hypercalls
locking locking
mmu
msr msr
vcpu-requests vcpu-requests

View File

@ -1,3 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0
======================
The x86 kvm shadow mmu The x86 kvm shadow mmu
====================== ======================
@ -7,27 +10,37 @@ physical addresses to host physical addresses.
The mmu code attempts to satisfy the following requirements: The mmu code attempts to satisfy the following requirements:
- correctness: the guest should not be able to determine that it is running - correctness:
the guest should not be able to determine that it is running
on an emulated mmu except for timing (we attempt to comply on an emulated mmu except for timing (we attempt to comply
with the specification, not emulate the characteristics of with the specification, not emulate the characteristics of
a particular implementation such as tlb size) a particular implementation such as tlb size)
- security: the guest must not be able to touch host memory not assigned - security:
the guest must not be able to touch host memory not assigned
to it to it
- performance: minimize the performance penalty imposed by the mmu - performance:
- scaling: need to scale to large memory and large vcpu guests minimize the performance penalty imposed by the mmu
- hardware: support the full range of x86 virtualization hardware - scaling:
- integration: Linux memory management code must be in control of guest memory need to scale to large memory and large vcpu guests
- hardware:
support the full range of x86 virtualization hardware
- integration:
Linux memory management code must be in control of guest memory
so that swapping, page migration, page merging, transparent so that swapping, page migration, page merging, transparent
hugepages, and similar features work without change hugepages, and similar features work without change
- dirty tracking: report writes to guest memory to enable live migration - dirty tracking:
report writes to guest memory to enable live migration
and framebuffer-based displays and framebuffer-based displays
- footprint: keep the amount of pinned kernel memory low (most memory - footprint:
keep the amount of pinned kernel memory low (most memory
should be shrinkable) should be shrinkable)
- reliability: avoid multipage or GFP_ATOMIC allocations - reliability:
avoid multipage or GFP_ATOMIC allocations
Acronyms Acronyms
======== ========
==== ====================================================================
pfn host page frame number pfn host page frame number
hpa host physical address hpa host physical address
hva host virtual address hva host virtual address
@ -41,6 +54,7 @@ pte page table entry (used also to refer generically to paging structure
gpte guest pte (referring to gfns) gpte guest pte (referring to gfns)
spte shadow pte (referring to pfns) spte shadow pte (referring to pfns)
tdp two dimensional paging (vendor neutral term for NPT and EPT) tdp two dimensional paging (vendor neutral term for NPT and EPT)
==== ====================================================================
Virtual and real hardware supported Virtual and real hardware supported
=================================== ===================================
@ -90,11 +104,13 @@ Events
The mmu is driven by events, some from the guest, some from the host. The mmu is driven by events, some from the guest, some from the host.
Guest generated events: Guest generated events:
- writes to control registers (especially cr3) - writes to control registers (especially cr3)
- invlpg/invlpga instruction execution - invlpg/invlpga instruction execution
- access to missing or protected translations - access to missing or protected translations
Host generated events: Host generated events:
- changes in the gpa->hpa translation (either through gpa->hva changes or - changes in the gpa->hpa translation (either through gpa->hva changes or
through hva->hpa changes) through hva->hpa changes)
- memory pressure (the shrinker) - memory pressure (the shrinker)
@ -117,16 +133,19 @@ Leaf ptes point at guest pages.
The following table shows translations encoded by leaf ptes, with higher-level The following table shows translations encoded by leaf ptes, with higher-level
translations in parentheses: translations in parentheses:
Non-nested guests: Non-nested guests::
nonpaging: gpa->hpa nonpaging: gpa->hpa
paging: gva->gpa->hpa paging: gva->gpa->hpa
paging, tdp: (gva->)gpa->hpa paging, tdp: (gva->)gpa->hpa
Nested guests:
Nested guests::
non-tdp: ngva->gpa->hpa (*) non-tdp: ngva->gpa->hpa (*)
tdp: (ngva->)ngpa->gpa->hpa tdp: (ngva->)ngpa->gpa->hpa
(*) the guest hypervisor will encode the ngva->gpa translation into its page (*) the guest hypervisor will encode the ngva->gpa translation into its page
tables if npt is not present tables if npt is not present
Shadow pages contain the following information: Shadow pages contain the following information:
role.level: role.level:
@ -291,28 +310,41 @@ Handling a page fault is performed as follows:
- if the RSV bit of the error code is set, the page fault is caused by guest - if the RSV bit of the error code is set, the page fault is caused by guest
accessing MMIO and cached MMIO information is available. accessing MMIO and cached MMIO information is available.
- walk shadow page table - walk shadow page table
- check for valid generation number in the spte (see "Fast invalidation of - check for valid generation number in the spte (see "Fast invalidation of
MMIO sptes" below) MMIO sptes" below)
- cache the information to vcpu->arch.mmio_gva, vcpu->arch.mmio_access and - cache the information to vcpu->arch.mmio_gva, vcpu->arch.mmio_access and
vcpu->arch.mmio_gfn, and call the emulator vcpu->arch.mmio_gfn, and call the emulator
- If both P bit and R/W bit of error code are set, this could possibly - If both P bit and R/W bit of error code are set, this could possibly
be handled as a "fast page fault" (fixed without taking the MMU lock). See be handled as a "fast page fault" (fixed without taking the MMU lock). See
the description in Documentation/virt/kvm/locking.txt. the description in Documentation/virt/kvm/locking.txt.
- if needed, walk the guest page tables to determine the guest translation - if needed, walk the guest page tables to determine the guest translation
(gva->gpa or ngpa->gpa) (gva->gpa or ngpa->gpa)
- if permissions are insufficient, reflect the fault back to the guest - if permissions are insufficient, reflect the fault back to the guest
- determine the host page - determine the host page
- if this is an mmio request, there is no host page; cache the info to - if this is an mmio request, there is no host page; cache the info to
vcpu->arch.mmio_gva, vcpu->arch.mmio_access and vcpu->arch.mmio_gfn vcpu->arch.mmio_gva, vcpu->arch.mmio_access and vcpu->arch.mmio_gfn
- walk the shadow page table to find the spte for the translation, - walk the shadow page table to find the spte for the translation,
instantiating missing intermediate page tables as necessary instantiating missing intermediate page tables as necessary
- If this is an mmio request, cache the mmio info to the spte and set some - If this is an mmio request, cache the mmio info to the spte and set some
reserved bit on the spte (see callers of kvm_mmu_set_mmio_spte_mask) reserved bit on the spte (see callers of kvm_mmu_set_mmio_spte_mask)
- try to unsynchronize the page - try to unsynchronize the page
- if successful, we can let the guest continue and modify the gpte - if successful, we can let the guest continue and modify the gpte
- emulate the instruction - emulate the instruction
- if failed, unshadow the page and let the guest continue - if failed, unshadow the page and let the guest continue
- update any translations that were modified by the instruction - update any translations that were modified by the instruction
invlpg handling: invlpg handling:
@ -324,10 +356,12 @@ invlpg handling:
Guest control register updates: Guest control register updates:
- mov to cr3 - mov to cr3
- look up new shadow roots - look up new shadow roots
- synchronize newly reachable shadow pages - synchronize newly reachable shadow pages
- mov to cr0/cr4/efer - mov to cr0/cr4/efer
- set up mmu context for new paging mode - set up mmu context for new paging mode
- look up new shadow roots - look up new shadow roots
- synchronize newly reachable shadow pages - synchronize newly reachable shadow pages
@ -358,6 +392,7 @@ on fault type:
(user write faults generate a #PF) (user write faults generate a #PF)
In the first case there are two additional complications: In the first case there are two additional complications:
- if CR4.SMEP is enabled: since we've turned the page into a kernel page, - if CR4.SMEP is enabled: since we've turned the page into a kernel page,
the kernel may now execute it. We handle this by also setting spte.nx. the kernel may now execute it. We handle this by also setting spte.nx.
If we get a user fetch or read fault, we'll change spte.u=1 and If we get a user fetch or read fault, we'll change spte.u=1 and
@ -446,4 +481,3 @@ Further reading
- NPT presentation from KVM Forum 2008 - NPT presentation from KVM Forum 2008
http://www.linux-kvm.org/images/c/c8/KvmForum2008%24kdf2008_21.pdf http://www.linux-kvm.org/images/c/c8/KvmForum2008%24kdf2008_21.pdf