OpenCloudOS-Kernel/Documentation
David Matlack ada51a9de7 KVM: x86/mmu: Extend Eager Page Splitting to nested MMUs
Add support for Eager Page Splitting pages that are mapped by nested
MMUs. Walk through the rmap first splitting all 1GiB pages to 2MiB
pages, and then splitting all 2MiB pages to 4KiB pages.

Note, Eager Page Splitting is limited to nested MMUs as a policy rather
than due to any technical reason (the sp->role.guest_mode check could
just be deleted and Eager Page Splitting would work correctly for all
shadow MMU pages). There is really no reason to support Eager Page
Splitting for tdp_mmu=N, since such support will eventually be phased
out, and there is no current use case supporting Eager Page Splitting on
hosts where TDP is either disabled or unavailable in hardware.
Furthermore, future improvements to nested MMU scalability may diverge
the code from the legacy shadow paging implementation. These
improvements will be simpler to make if Eager Page Splitting does not
have to worry about legacy shadow paging.

Splitting huge pages mapped by nested MMUs requires dealing with some
extra complexity beyond that of the TDP MMU:

(1) The shadow MMU has a limit on the number of shadow pages that are
    allowed to be allocated. So, as a policy, Eager Page Splitting
    refuses to split if there are KVM_MIN_FREE_MMU_PAGES or fewer
    pages available.

(2) Splitting a huge page may end up re-using an existing lower level
    shadow page tables. This is unlike the TDP MMU which always allocates
    new shadow page tables when splitting.

(3) When installing the lower level SPTEs, they must be added to the
    rmap which may require allocating additional pte_list_desc structs.

Case (2) is especially interesting since it may require a TLB flush,
unlike the TDP MMU which can fully split huge pages without any TLB
flushes. Specifically, an existing lower level page table may point to
even lower level page tables that are not fully populated, effectively
unmapping a portion of the huge page, which requires a flush.  As of
this commit, a flush is always done always after dropping the huge page
and before installing the lower level page table.

This TLB flush could instead be delayed until the MMU lock is about to be
dropped, which would batch flushes for multiple splits.  However these
flushes should be rare in practice (a huge page must be aliased in
multiple SPTEs and have been split for NX Huge Pages in only some of
them). Flushing immediately is simpler to plumb and also reduces the
chances of tripping over a CPU bug (e.g. see iTLB multihit).

[ This commit is based off of the original implementation of Eager Page
  Splitting from Peter in Google's kernel from 2016. ]

Suggested-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220516232138.1783324-23-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24 04:52:00 -04:00
..
ABI Driver core changes for 5.19-rc1 2022-06-03 11:48:47 -07:00
PCI PCI/doc: Update obsolete pci_set_dma_mask() references 2022-04-21 12:10:44 -05:00
RCU Merge branch 'exp.2022.05.11a' into HEAD 2022-05-11 11:49:35 -07:00
accounting delayacct: track delays from write-protect copy 2022-06-01 15:55:25 -07:00
admin-guide KVM: x86/mmu: Extend Eager Page Splitting to nested MMUs 2022-06-24 04:52:00 -04:00
arc
arm ARM: pxa/mmp: remove traces of plat-pxa 2022-05-31 16:07:52 +02:00
arm64 S390: 2022-05-26 14:20:14 -07:00
block
bpf bpf, docs: Fix typo "respetively" to "respectively" 2022-04-28 17:20:48 +02:00
cdrom It was a moderately busy cycle for documentation; highlights include: 2022-05-25 11:17:41 -07:00
core-api It was a moderately busy cycle for documentation; highlights include: 2022-05-25 11:17:41 -07:00
cpu-freq
crypto
dev-tools Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
devicetree Clockevent/clocksource updates: 2022-06-05 10:47:06 -07:00
doc-guide Documentation/process: use scripts/get_maintainer.pl on patches 2022-05-09 16:12:16 -06:00
driver-api Driver core changes for 5.19-rc1 2022-06-03 11:48:47 -07:00
fault-injection docs: fault-injection: fix defaults 2022-04-16 02:46:44 -06:00
fb
features asm-generic changes for 5.19 2022-05-26 10:50:30 -07:00
filesystems Changes since last update: 2022-06-01 11:54:29 -07:00
firmware-guide TTY / Serial driver changes for 5.19-rc1 2022-06-03 11:08:40 -07:00
firmware_class
fpga Documentation: fpga: dfl: add link address of feature id table 2022-05-10 16:05:27 +08:00
gpu drm/todo: Add entry for using kunit in the subsystem 2022-05-05 10:09:06 +02:00
hid
hte hte: Add Tegra194 HTE kernel provider 2022-05-04 11:06:13 +02:00
hwmon hwmon: Make chip parameter for with_info API mandatory 2022-05-22 11:32:31 -07:00
i2c docs: i2c: reference simple probes 2022-05-04 22:35:19 +02:00
ia64
iio
images docs: add SVG version of the Linux logo 2022-06-01 09:32:45 -06:00
infiniband
input documentation: Format button_dev as a pointer. 2022-06-01 09:34:28 -06:00
isdn
kbuild Kbuild updates for v5.19 2022-05-26 12:09:50 -07:00
kernel-hacking
leds leds: qcom-lpg: Require pattern to follow documentation 2022-05-24 22:08:10 +02:00
litmus-tests
livepatch
locking
loongarch Documentation: LoongArch: Add basic documentations 2022-06-03 20:09:27 +08:00
m68k
maintainer
mhi
mips
misc-devices Documentation: Wire Oxford Semiconductor PCIe (Tornado) 950 2022-05-19 18:24:22 +02:00
netlabel
networking net/ipv6: Expand and rename accept_unsolicited_na to accept_untracked_na 2022-05-31 11:36:57 +02:00
nios2
nvdimm
openrisc
parisc
pcmcia
peci
power Documentation: EM: Add artificial EM registration description 2022-04-13 16:26:18 +02:00
powerpc powerpc: Enable the DAWR on POWER9 DD2.3 and above 2022-05-22 15:59:53 +10:00
process It was a moderately busy cycle for documentation; highlights include: 2022-05-25 11:17:41 -07:00
riscv Documentation: riscv: Add sv48 description to VM layout 2022-06-01 20:38:34 -07:00
s390
scheduler docs/scheduler: fix unit error 2022-04-16 02:54:32 -06:00
scsi
security integrity-v5.19 2022-05-24 13:50:39 -07:00
sh
sound ALSA: usb-audio: Add quirk bits for enabling/disabling generic implicit fb 2022-04-21 10:17:17 +02:00
sparc
sphinx docs: pdfdocs: Add space for chapter counts >= 100 in TOC 2022-05-17 13:41:26 -06:00
sphinx-static
spi
staging
target
timers
tools Updates to Real Time Linux Analysis tool for 5.19: 2022-05-29 10:48:58 -07:00
trace tracing/timerlat: Print stacktrace in the IRQ handler if needed 2022-05-26 21:13:00 -04:00
translations Documentation/zh_CN: Add basic LoongArch documentations 2022-06-03 20:09:27 +08:00
usb usb: gadget: uvc: allow changing interface name via configfs 2022-04-21 18:14:34 +02:00
userspace-api media: lirc: add missing exceptions for lirc uapi header file 2022-05-26 14:30:17 -07:00
virt KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis 2022-06-24 04:51:49 -04:00
vm Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
w1
watchdog
x86 It was a moderately busy cycle for documentation; highlights include: 2022-05-25 11:17:41 -07:00
xtensa
.gitignore
Changes
CodingStyle
Kconfig
Makefile
SubmittingPatches
arch.rst Documentation: LoongArch: Add basic documentations 2022-06-03 20:09:27 +08:00
asm-annotations.rst
atomic_bitops.txt
atomic_t.txt
conf.py docs/conf.py: Cope with removal of language=None in Sphinx 5.0.0 2022-06-01 09:26:05 -06:00
docutils.conf
dontdiff randstruct: Move seed generation into scripts/basic/ 2022-05-08 01:33:07 -07:00
index.rst hte: New subsystem for v5.19-rc1 2022-06-05 09:12:28 -07:00
memory-barriers.txt