OpenCloudOS-Kernel/fs
Yosry Ahmed ebc97a52b5 mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
We keep track of several kernel memory stats (total kernel memory, page
tables, stack, vmalloc, etc) on multiple levels (global, per-node,
per-memcg, etc). These stats give insights to users to how much memory
is used by the kernel and for what purposes.

Currently, memory used by KVM mmu is not accounted in any of those
kernel memory stats. This patch series accounts the memory pages
used by KVM for page tables in those stats in a new
NR_SECONDARY_PAGETABLE stat. This stat can be later extended to account
for other types of secondary pages tables (e.g. iommu page tables).

KVM has a decent number of large allocations that aren't for page
tables, but for most of them, the number/size of those allocations
scales linearly with either the number of vCPUs or the amount of memory
assigned to the VM. KVM's secondary page table allocations do not scale
linearly, especially when nested virtualization is in use.

From a KVM perspective, NR_SECONDARY_PAGETABLE will scale with KVM's
per-VM pages_{4k,2m,1g} stats unless the guest is doing something
bizarre (e.g. accessing only 4kb chunks of 2mb pages so that KVM is
forced to allocate a large number of page tables even though the guest
isn't accessing that much memory). However, someone would need to either
understand how KVM works to make that connection, or know (or be told) to
go look at KVM's stats if they're running VMs to better decipher the stats.

Furthermore, having NR_PAGETABLE side-by-side with NR_SECONDARY_PAGETABLE
is informative. For example, when backing a VM with THP vs. HugeTLB,
NR_SECONDARY_PAGETABLE is roughly the same, but NR_PAGETABLE is an order
of magnitude higher with THP. So having this stat will at the very least
prove to be useful for understanding tradeoffs between VM backing types,
and likely even steer folks towards potential optimizations.

The original discussion with more details about the rationale:
https://lore.kernel.org/all/87ilqoi77b.wl-maz@kernel.org

This stat will be used by subsequent patches to count KVM mmu
memory usage.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220823004639.2387269-2-yosryahmed@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-08-24 13:51:42 -07:00
..
9p 9p: Fix some kernel-doc comments 2022-07-02 18:52:21 +09:00
adfs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
affs affs: use memcpy_to_page and remove replace kmap_atomic() 2022-08-01 19:53:31 +02:00
afs afs: Enable multipage folio support 2022-08-13 17:20:51 -07:00
autofs autofs: remove unused ino field inode 2022-07-17 17:31:42 -07:00
befs befs: Convert befs_symlink_read_folio() to use a folio 2022-08-02 12:34:03 -04:00
bfs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
btrfs - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
cachefiles cachefiles: narrow the scope of flushed requests when releasing fd 2022-07-05 16:12:21 +01:00
ceph We have a good pile of various fixes and cleanups from Xiubo, Jeff, 2022-08-11 12:41:07 -07:00
cifs 8 cifs/smb3 fixes, mostly restructuring/cleanup, including two for stable 2022-08-13 17:31:18 -07:00
coda coda: Convert coda_symlink_filler() to use a folio 2022-08-02 12:34:03 -04:00
configfs configfs: fix a race in configfs_{,un}register_subsystem() 2022-02-22 18:30:28 +01:00
cramfs cramfs: read_mapping_page() is synchronous 2022-08-02 12:34:02 -04:00
crypto We have a good pile of various fixes and cleanups from Xiubo, Jeff, 2022-08-11 12:41:07 -07:00
debugfs debugfs: Document that debugfs_create functions need not be error checked 2022-02-25 11:56:13 +01:00
devpts fsnotify: fix fsnotify hooks in pseudo filesystems 2022-01-24 14:17:02 +01:00
dlm fs: dlm: move kref_put assert for lkb structs 2022-08-01 09:31:46 -05:00
ecryptfs ecryptfs: Convert ecryptfs to read_folio 2022-05-09 16:21:45 -04:00
efivarfs efi: vars: Move efivar caching layer into efivarfs 2022-06-24 20:40:19 +02:00
efs efs: Convert efs symlinks to read_folio 2022-05-09 16:21:45 -04:00
erofs - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
exfat exfat: Drop superfluous new line for error messages 2022-08-01 10:14:07 +09:00
exportfs exportfs: support idmapped mounts 2022-04-28 16:31:10 +02:00
ext2 - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
ext4 - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
f2fs f2fs-for-6.0 2022-08-08 11:18:31 -07:00
fat Updates to various subsystems which I help look after. lib, ocfs2, 2022-08-07 10:03:24 -07:00
freevxfs freevxfs: Convert vxfs_immed_read_folio() to use a folio 2022-08-02 12:34:03 -04:00
fscache fscache: add tracepoint when failing cookie 2022-08-09 14:13:59 +01:00
fuse iov_iter stuff, part 2, rebased 2022-08-08 20:04:35 -07:00
gfs2 New code for 6.0: 2022-08-11 13:11:49 -07:00
hfs hfs: Remove check for PageError 2022-06-29 08:51:06 -04:00
hfsplus Folio changes for 6.0 2022-08-03 10:35:43 -07:00
hostfs hostfs: Handle page write errors correctly 2022-08-02 12:34:02 -04:00
hpfs hpfs: Convert symlinks to read_folio 2022-05-09 16:21:45 -04:00
hugetlbfs iov_iter stuff, part 2, rebased 2022-08-08 20:04:35 -07:00
iomap New code for 6.0: 2022-08-11 13:11:49 -07:00
isofs fs/buffer: Combine two submit_bh() and ll_rw_block() arguments 2022-07-14 12:14:32 -06:00
jbd2 - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
jffs2 This pull request contains fixes for JFFS2, UBI and UBIFS 2022-06-03 14:42:24 -07:00
jfs Folio changes for 6.0 2022-08-03 10:35:43 -07:00
kernfs kernfs: Fix typo 'the the' in comment 2022-07-28 10:57:25 +02:00
ksmbd 12 server fixes, including for oopses, memory leaks and adding a channel lock 2022-08-08 20:15:13 -07:00
lockd lockd: detect and reject lock arguments that overflow 2022-08-04 10:28:48 -04:00
minix fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
netfs netfs: do not unlock and put the folio twice 2022-07-14 10:10:12 +02:00
nfs NFS client updates for Linux 5.20 2022-08-10 14:04:32 -07:00
nfs_common
nfsd NFSD 6.0 Release Notes 2022-08-09 14:56:49 -07:00
nilfs2 Folio changes for 6.0 2022-08-03 10:35:43 -07:00
nls
notify fsnotify: Fix comment typo 2022-07-26 13:38:47 +02:00
ntfs Folio changes for 6.0 2022-08-03 10:35:43 -07:00
ntfs3 Folio changes for 6.0 2022-08-03 10:35:43 -07:00
ocfs2 fs.setgid.v6.0 2022-08-09 09:52:28 -07:00
omfs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
openpromfs fs: allocate inode by using alloc_inode_sb() 2022-03-22 15:57:03 -07:00
orangefs orangefs: Remove test for folio error 2022-06-29 08:51:07 -04:00
overlayfs overlayfs update for 6.0 2022-08-08 11:03:11 -07:00
proc mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses. 2022-08-24 13:51:42 -07:00
pstore EFI updates for v5.20 2022-08-03 14:38:02 -07:00
qnx4 fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
qnx6 fs: Convert mpage_readpage to mpage_read_folio 2022-05-09 16:21:44 -04:00
quota - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
ramfs
reiserfs Folio changes for 6.0 2022-08-03 10:35:43 -07:00
romfs romfs: Convert romfs to read_folio 2022-05-09 16:21:46 -04:00
smbfs_common Add various fsctl structs 2022-05-23 20:24:12 -05:00
squashfs Updates to various subsystems which I help look after. lib, ocfs2, 2022-08-07 10:03:24 -07:00
sysfs kobject: kobj_type: remove default_attrs 2022-04-05 15:39:19 +02:00
sysv Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
tracefs tracefs: Fix syntax errors in comments 2022-06-17 19:01:28 -04:00
ubifs - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
udf fs/buffer: Combine two submit_bh() and ll_rw_block() arguments 2022-07-14 12:14:32 -06:00
ufs Folio changes for 6.0 2022-08-03 10:35:43 -07:00
unicode kbuild: unify cmd_copy and cmd_shipped 2022-02-14 10:37:32 +09:00
vboxsf vboxsf: Convert vboxsf to read_folio 2022-05-09 16:21:46 -04:00
verity fs-verity: mention btrfs support 2022-07-15 23:42:30 -07:00
xfs New code for 6.0: 2022-08-13 13:50:11 -07:00
zonefs New code for 6.0: 2022-08-11 13:11:49 -07:00
Kconfig mm: hugetlb_vmemmap: introduce the name HVO 2022-08-08 18:06:42 -07:00
Kconfig.binfmt m68knommu: changes for linux 5.19 2022-05-30 10:56:18 -07:00
Makefile io_uring: move to separate directory 2022-07-24 18:39:10 -06:00
aio.c iov_iter work, part 1 - isolated cleanups and optimizations. 2022-08-03 13:50:22 -07:00
anon_inodes.c
attr.c vfs: Check the truncate maximum size in inode_newsize_ok() 2022-08-08 10:39:29 -07:00
bad_inode.c
binfmt_aout.c
binfmt_elf.c revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE" 2022-04-15 14:49:56 -07:00
binfmt_elf_fdpic.c coredump: Snapshot the vmas in do_coredump 2022-03-08 12:55:29 -06:00
binfmt_elf_test.c binfmt_elf: Introduce KUnit test 2022-03-03 20:38:56 -08:00
binfmt_flat.c binfmt_flat: Remove shared library support 2022-04-22 10:57:18 -07:00
binfmt_misc.c Fix regression due to "fs: move binfmt_misc sysctl to its own file" 2022-02-09 09:50:02 -08:00
binfmt_script.c
buffer.c Folio changes for 6.0 2022-08-03 10:35:43 -07:00
char_dev.c
compat_binfmt_elf.c binfmt_elf: Introduce KUnit test 2022-03-03 20:38:56 -08:00
coredump.c fs: do not compare against ->llseek 2022-07-16 09:19:15 -04:00
d_path.c
dax.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
dcache.c We have a good pile of various fixes and cleanups from Xiubo, Jeff, 2022-08-11 12:41:07 -07:00
direct-io.c iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() 2022-08-08 22:37:22 -04:00
drop_caches.c
eventfd.c
eventpoll.c epoll: autoremove wakers even more aggressively 2022-07-17 17:31:40 -07:00
exec.c posix-cpu-timers: Cleanup CPU timers before freeing them during exec 2022-08-09 20:02:13 +02:00
fcntl.c keep iocb_flags() result cached in struct file 2022-06-10 16:10:23 -04:00
fhandle.c
file.c fix the breakage in close_fd_get_file() calling conventions change 2022-06-05 15:03:03 -04:00
file_table.c iov_iter work, part 1 - isolated cleanups and optimizations. 2022-08-03 13:50:22 -07:00
filesystems.c
fs-writeback.c writeback: Fix inode->i_io_list not be protected by inode->i_lock error 2022-06-06 09:54:30 +02:00
fs_context.c vfs: fs_context: fix up param length parsing in legacy_parse_param 2022-01-18 09:23:19 +02:00
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c uninline may_mount() and don't opencode it in fspick(2)/fsopen(2) 2022-05-19 23:25:10 -04:00
init.c
inode.c We have a good pile of various fixes and cleanups from Xiubo, Jeff, 2022-08-11 12:41:07 -07:00
internal.h Cleanups (and one fix) around struct mount handling. 2022-06-04 19:00:05 -07:00
ioctl.c Fixes for 5.18-rc1: 2022-04-01 19:35:56 -07:00
kernel_read_file.c fs/kernel_read_file: allow to read files up-to ssize_t 2022-06-16 19:58:21 -07:00
libfs.c fs: Convert simple_readpage to simple_read_folio 2022-05-09 16:21:44 -04:00
locks.c fs/lock: Rearrange ops in flock syscall. 2022-07-18 10:01:47 -04:00
mbcache.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
mount.h switch try_to_unlazy_next() to __legitimize_mnt() 2022-07-05 16:18:21 -04:00
mpage.c Folio changes for 6.0 2022-08-03 10:35:43 -07:00
namei.c fs.setgid.v6.0 2022-08-09 09:52:28 -07:00
namespace.c switch try_to_unlazy_next() to __legitimize_mnt() 2022-07-05 16:18:21 -04:00
no-block.c
nsfs.c
open.c iov_iter work, part 1 - isolated cleanups and optimizations. 2022-08-03 13:50:22 -07:00
pipe.c Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
pnode.c
pnode.h
posix_acl.c acl: make posix_acl_clone() available to overlayfs 2022-07-15 22:09:57 +02:00
proc_namespace.c vfs: escape hash as well 2022-06-28 13:58:05 -04:00
read_write.c switch new_sync_{read,write}() to ITER_UBUF 2022-08-08 22:37:15 -04:00
readdir.c
remap_range.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
select.c select: Fix indefinitely sleeping task in poll_schedule_timeout() 2022-01-11 09:03:05 -08:00
seq_file.c rxrpc: Fix locking issue 2022-05-22 21:03:01 +01:00
signalfd.c Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2022-01-17 05:49:30 +02:00
splice.c iter_to_pipe(): switch to advancing variant of iov_iter_get_pages() 2022-08-08 22:37:23 -04:00
stack.c
stat.c RISC-V Patches for the 5.19 Merge Window, Part 1 2022-05-31 14:10:54 -07:00
statfs.c
super.c fuse update for 6.0 2022-08-08 11:10:02 -07:00
sync.c riscv: compat: syscall: Add compat_sys_call_table implementation 2022-04-26 13:36:25 -07:00
sysctls.c fs: move namespace sysctls and declare fs base directory 2022-01-22 08:33:36 +02:00
timerfd.c
userfaultfd.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
utimes.c
xattr.c acl: move idmapped mount fixup into vfs_{g,s}etxattr() 2022-07-15 22:08:59 +02:00