docs: kvm: Convert mmu.txt to ReST format
- Use document title and chapter markups; - Add markups for tables; - Add markups for literal blocks; - Add blank lines and adjust indentation. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This commit is contained in:
parent
75e7fcdb4a
commit
037d1f92ef
|
@ -13,6 +13,7 @@ KVM
|
||||||
halt-polling
|
halt-polling
|
||||||
hypercalls
|
hypercalls
|
||||||
locking
|
locking
|
||||||
|
mmu
|
||||||
msr
|
msr
|
||||||
vcpu-requests
|
vcpu-requests
|
||||||
|
|
||||||
|
|
|
@ -1,3 +1,6 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
======================
|
||||||
The x86 kvm shadow mmu
|
The x86 kvm shadow mmu
|
||||||
======================
|
======================
|
||||||
|
|
||||||
|
@ -7,27 +10,37 @@ physical addresses to host physical addresses.
|
||||||
|
|
||||||
The mmu code attempts to satisfy the following requirements:
|
The mmu code attempts to satisfy the following requirements:
|
||||||
|
|
||||||
- correctness: the guest should not be able to determine that it is running
|
- correctness:
|
||||||
|
the guest should not be able to determine that it is running
|
||||||
on an emulated mmu except for timing (we attempt to comply
|
on an emulated mmu except for timing (we attempt to comply
|
||||||
with the specification, not emulate the characteristics of
|
with the specification, not emulate the characteristics of
|
||||||
a particular implementation such as tlb size)
|
a particular implementation such as tlb size)
|
||||||
- security: the guest must not be able to touch host memory not assigned
|
- security:
|
||||||
|
the guest must not be able to touch host memory not assigned
|
||||||
to it
|
to it
|
||||||
- performance: minimize the performance penalty imposed by the mmu
|
- performance:
|
||||||
- scaling: need to scale to large memory and large vcpu guests
|
minimize the performance penalty imposed by the mmu
|
||||||
- hardware: support the full range of x86 virtualization hardware
|
- scaling:
|
||||||
- integration: Linux memory management code must be in control of guest memory
|
need to scale to large memory and large vcpu guests
|
||||||
|
- hardware:
|
||||||
|
support the full range of x86 virtualization hardware
|
||||||
|
- integration:
|
||||||
|
Linux memory management code must be in control of guest memory
|
||||||
so that swapping, page migration, page merging, transparent
|
so that swapping, page migration, page merging, transparent
|
||||||
hugepages, and similar features work without change
|
hugepages, and similar features work without change
|
||||||
- dirty tracking: report writes to guest memory to enable live migration
|
- dirty tracking:
|
||||||
|
report writes to guest memory to enable live migration
|
||||||
and framebuffer-based displays
|
and framebuffer-based displays
|
||||||
- footprint: keep the amount of pinned kernel memory low (most memory
|
- footprint:
|
||||||
|
keep the amount of pinned kernel memory low (most memory
|
||||||
should be shrinkable)
|
should be shrinkable)
|
||||||
- reliability: avoid multipage or GFP_ATOMIC allocations
|
- reliability:
|
||||||
|
avoid multipage or GFP_ATOMIC allocations
|
||||||
|
|
||||||
Acronyms
|
Acronyms
|
||||||
========
|
========
|
||||||
|
|
||||||
|
==== ====================================================================
|
||||||
pfn host page frame number
|
pfn host page frame number
|
||||||
hpa host physical address
|
hpa host physical address
|
||||||
hva host virtual address
|
hva host virtual address
|
||||||
|
@ -41,6 +54,7 @@ pte page table entry (used also to refer generically to paging structure
|
||||||
gpte guest pte (referring to gfns)
|
gpte guest pte (referring to gfns)
|
||||||
spte shadow pte (referring to pfns)
|
spte shadow pte (referring to pfns)
|
||||||
tdp two dimensional paging (vendor neutral term for NPT and EPT)
|
tdp two dimensional paging (vendor neutral term for NPT and EPT)
|
||||||
|
==== ====================================================================
|
||||||
|
|
||||||
Virtual and real hardware supported
|
Virtual and real hardware supported
|
||||||
===================================
|
===================================
|
||||||
|
@ -90,11 +104,13 @@ Events
|
||||||
The mmu is driven by events, some from the guest, some from the host.
|
The mmu is driven by events, some from the guest, some from the host.
|
||||||
|
|
||||||
Guest generated events:
|
Guest generated events:
|
||||||
|
|
||||||
- writes to control registers (especially cr3)
|
- writes to control registers (especially cr3)
|
||||||
- invlpg/invlpga instruction execution
|
- invlpg/invlpga instruction execution
|
||||||
- access to missing or protected translations
|
- access to missing or protected translations
|
||||||
|
|
||||||
Host generated events:
|
Host generated events:
|
||||||
|
|
||||||
- changes in the gpa->hpa translation (either through gpa->hva changes or
|
- changes in the gpa->hpa translation (either through gpa->hva changes or
|
||||||
through hva->hpa changes)
|
through hva->hpa changes)
|
||||||
- memory pressure (the shrinker)
|
- memory pressure (the shrinker)
|
||||||
|
@ -117,16 +133,19 @@ Leaf ptes point at guest pages.
|
||||||
The following table shows translations encoded by leaf ptes, with higher-level
|
The following table shows translations encoded by leaf ptes, with higher-level
|
||||||
translations in parentheses:
|
translations in parentheses:
|
||||||
|
|
||||||
Non-nested guests:
|
Non-nested guests::
|
||||||
|
|
||||||
nonpaging: gpa->hpa
|
nonpaging: gpa->hpa
|
||||||
paging: gva->gpa->hpa
|
paging: gva->gpa->hpa
|
||||||
paging, tdp: (gva->)gpa->hpa
|
paging, tdp: (gva->)gpa->hpa
|
||||||
Nested guests:
|
|
||||||
|
Nested guests::
|
||||||
|
|
||||||
non-tdp: ngva->gpa->hpa (*)
|
non-tdp: ngva->gpa->hpa (*)
|
||||||
tdp: (ngva->)ngpa->gpa->hpa
|
tdp: (ngva->)ngpa->gpa->hpa
|
||||||
|
|
||||||
(*) the guest hypervisor will encode the ngva->gpa translation into its page
|
(*) the guest hypervisor will encode the ngva->gpa translation into its page
|
||||||
tables if npt is not present
|
tables if npt is not present
|
||||||
|
|
||||||
Shadow pages contain the following information:
|
Shadow pages contain the following information:
|
||||||
role.level:
|
role.level:
|
||||||
|
@ -291,28 +310,41 @@ Handling a page fault is performed as follows:
|
||||||
|
|
||||||
- if the RSV bit of the error code is set, the page fault is caused by guest
|
- if the RSV bit of the error code is set, the page fault is caused by guest
|
||||||
accessing MMIO and cached MMIO information is available.
|
accessing MMIO and cached MMIO information is available.
|
||||||
|
|
||||||
- walk shadow page table
|
- walk shadow page table
|
||||||
- check for valid generation number in the spte (see "Fast invalidation of
|
- check for valid generation number in the spte (see "Fast invalidation of
|
||||||
MMIO sptes" below)
|
MMIO sptes" below)
|
||||||
- cache the information to vcpu->arch.mmio_gva, vcpu->arch.mmio_access and
|
- cache the information to vcpu->arch.mmio_gva, vcpu->arch.mmio_access and
|
||||||
vcpu->arch.mmio_gfn, and call the emulator
|
vcpu->arch.mmio_gfn, and call the emulator
|
||||||
|
|
||||||
- If both P bit and R/W bit of error code are set, this could possibly
|
- If both P bit and R/W bit of error code are set, this could possibly
|
||||||
be handled as a "fast page fault" (fixed without taking the MMU lock). See
|
be handled as a "fast page fault" (fixed without taking the MMU lock). See
|
||||||
the description in Documentation/virt/kvm/locking.txt.
|
the description in Documentation/virt/kvm/locking.txt.
|
||||||
|
|
||||||
- if needed, walk the guest page tables to determine the guest translation
|
- if needed, walk the guest page tables to determine the guest translation
|
||||||
(gva->gpa or ngpa->gpa)
|
(gva->gpa or ngpa->gpa)
|
||||||
|
|
||||||
- if permissions are insufficient, reflect the fault back to the guest
|
- if permissions are insufficient, reflect the fault back to the guest
|
||||||
|
|
||||||
- determine the host page
|
- determine the host page
|
||||||
|
|
||||||
- if this is an mmio request, there is no host page; cache the info to
|
- if this is an mmio request, there is no host page; cache the info to
|
||||||
vcpu->arch.mmio_gva, vcpu->arch.mmio_access and vcpu->arch.mmio_gfn
|
vcpu->arch.mmio_gva, vcpu->arch.mmio_access and vcpu->arch.mmio_gfn
|
||||||
|
|
||||||
- walk the shadow page table to find the spte for the translation,
|
- walk the shadow page table to find the spte for the translation,
|
||||||
instantiating missing intermediate page tables as necessary
|
instantiating missing intermediate page tables as necessary
|
||||||
|
|
||||||
- If this is an mmio request, cache the mmio info to the spte and set some
|
- If this is an mmio request, cache the mmio info to the spte and set some
|
||||||
reserved bit on the spte (see callers of kvm_mmu_set_mmio_spte_mask)
|
reserved bit on the spte (see callers of kvm_mmu_set_mmio_spte_mask)
|
||||||
|
|
||||||
- try to unsynchronize the page
|
- try to unsynchronize the page
|
||||||
|
|
||||||
- if successful, we can let the guest continue and modify the gpte
|
- if successful, we can let the guest continue and modify the gpte
|
||||||
|
|
||||||
- emulate the instruction
|
- emulate the instruction
|
||||||
|
|
||||||
- if failed, unshadow the page and let the guest continue
|
- if failed, unshadow the page and let the guest continue
|
||||||
|
|
||||||
- update any translations that were modified by the instruction
|
- update any translations that were modified by the instruction
|
||||||
|
|
||||||
invlpg handling:
|
invlpg handling:
|
||||||
|
@ -324,10 +356,12 @@ invlpg handling:
|
||||||
Guest control register updates:
|
Guest control register updates:
|
||||||
|
|
||||||
- mov to cr3
|
- mov to cr3
|
||||||
|
|
||||||
- look up new shadow roots
|
- look up new shadow roots
|
||||||
- synchronize newly reachable shadow pages
|
- synchronize newly reachable shadow pages
|
||||||
|
|
||||||
- mov to cr0/cr4/efer
|
- mov to cr0/cr4/efer
|
||||||
|
|
||||||
- set up mmu context for new paging mode
|
- set up mmu context for new paging mode
|
||||||
- look up new shadow roots
|
- look up new shadow roots
|
||||||
- synchronize newly reachable shadow pages
|
- synchronize newly reachable shadow pages
|
||||||
|
@ -358,6 +392,7 @@ on fault type:
|
||||||
(user write faults generate a #PF)
|
(user write faults generate a #PF)
|
||||||
|
|
||||||
In the first case there are two additional complications:
|
In the first case there are two additional complications:
|
||||||
|
|
||||||
- if CR4.SMEP is enabled: since we've turned the page into a kernel page,
|
- if CR4.SMEP is enabled: since we've turned the page into a kernel page,
|
||||||
the kernel may now execute it. We handle this by also setting spte.nx.
|
the kernel may now execute it. We handle this by also setting spte.nx.
|
||||||
If we get a user fetch or read fault, we'll change spte.u=1 and
|
If we get a user fetch or read fault, we'll change spte.u=1 and
|
||||||
|
@ -446,4 +481,3 @@ Further reading
|
||||||
|
|
||||||
- NPT presentation from KVM Forum 2008
|
- NPT presentation from KVM Forum 2008
|
||||||
http://www.linux-kvm.org/images/c/c8/KvmForum2008%24kdf2008_21.pdf
|
http://www.linux-kvm.org/images/c/c8/KvmForum2008%24kdf2008_21.pdf
|
||||||
|
|
Loading…
Reference in New Issue