License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2018-10-31 06:09:49 +08:00
|
|
|
#include <linux/memblock.h>
|
2008-10-06 18:26:12 +08:00
|
|
|
#include <linux/compiler.h>
|
|
|
|
#include <linux/fs.h>
|
|
|
|
#include <linux/init.h>
|
2009-09-22 08:02:01 +08:00
|
|
|
#include <linux/ksm.h>
|
2008-10-06 18:26:12 +08:00
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/mmzone.h>
|
2015-02-12 07:24:51 +08:00
|
|
|
#include <linux/huge_mm.h>
|
2008-10-06 18:26:12 +08:00
|
|
|
#include <linux/proc_fs.h>
|
|
|
|
#include <linux/seq_file.h>
|
mm: introduce PageHuge() for testing huge/gigantic pages
A series of patches to enhance the /proc/pagemap interface and to add a
userspace executable which can be used to present the pagemap data.
Export 10 more flags to end users (and more for kernel developers):
11. KPF_MMAP (pseudo flag) memory mapped page
12. KPF_ANON (pseudo flag) memory mapped page (anonymous)
13. KPF_SWAPCACHE page is in swap cache
14. KPF_SWAPBACKED page is swap/RAM backed
15. KPF_COMPOUND_HEAD (*)
16. KPF_COMPOUND_TAIL (*)
17. KPF_HUGE hugeTLB pages
18. KPF_UNEVICTABLE page is in the unevictable LRU list
19. KPF_HWPOISON hardware detected corruption
20. KPF_NOPAGE (pseudo flag) no page frame at the address
(*) For compound pages, exporting _both_ head/tail info enables
users to tell where a compound page starts/ends, and its order.
a simple demo of the page-types tool
# ./page-types -h
page-types [options]
-r|--raw Raw mode, for kernel developers
-a|--addr addr-spec Walk a range of pages
-b|--bits bits-spec Walk pages with specified bits
-l|--list Show page details in ranges
-L|--list-each Show page details one by one
-N|--no-summary Don't show summay info
-h|--help Show this usage message
addr-spec:
N one page at offset N (unit: pages)
N+M pages range from N to N+M-1
N,M pages range from N to M-1
N, pages range from N to end
,M pages range from 0 to M
bits-spec:
bit1,bit2 (flags & (bit1|bit2)) != 0
bit1,bit2=bit1 (flags & (bit1|bit2)) == bit1
bit1,~bit2 (flags & (bit1|bit2)) == bit1
=bit1,bit2 flags == (bit1|bit2)
bit-names:
locked error referenced uptodate
dirty lru active slab
writeback reclaim buddy mmap
anonymous swapcache swapbacked compound_head
compound_tail huge unevictable hwpoison
nopage reserved(r) mlocked(r) mappedtodisk(r)
private(r) private_2(r) owner_private(r) arch(r)
uncached(r) readahead(o) slob_free(o) slub_frozen(o)
slub_debug(o)
(r) raw mode bits (o) overloaded bits
# ./page-types
flags page-count MB symbolic-flags long-symbolic-flags
0x0000000000000000 487369 1903 _________________________________
0x0000000000000014 5 0 __R_D____________________________ referenced,dirty
0x0000000000000020 1 0 _____l___________________________ lru
0x0000000000000024 34 0 __R__l___________________________ referenced,lru
0x0000000000000028 3838 14 ___U_l___________________________ uptodate,lru
0x0001000000000028 48 0 ___U_l_______________________I___ uptodate,lru,readahead
0x000000000000002c 6478 25 __RU_l___________________________ referenced,uptodate,lru
0x000100000000002c 47 0 __RU_l_______________________I___ referenced,uptodate,lru,readahead
0x0000000000000040 8344 32 ______A__________________________ active
0x0000000000000060 1 0 _____lA__________________________ lru,active
0x0000000000000068 348 1 ___U_lA__________________________ uptodate,lru,active
0x0001000000000068 12 0 ___U_lA______________________I___ uptodate,lru,active,readahead
0x000000000000006c 988 3 __RU_lA__________________________ referenced,uptodate,lru,active
0x000100000000006c 48 0 __RU_lA______________________I___ referenced,uptodate,lru,active,readahead
0x0000000000004078 1 0 ___UDlA_______b__________________ uptodate,dirty,lru,active,swapbacked
0x000000000000407c 34 0 __RUDlA_______b__________________ referenced,uptodate,dirty,lru,active,swapbacked
0x0000000000000400 503 1 __________B______________________ buddy
0x0000000000000804 1 0 __R________M_____________________ referenced,mmap
0x0000000000000828 1029 4 ___U_l_____M_____________________ uptodate,lru,mmap
0x0001000000000828 43 0 ___U_l_____M_________________I___ uptodate,lru,mmap,readahead
0x000000000000082c 382 1 __RU_l_____M_____________________ referenced,uptodate,lru,mmap
0x000100000000082c 12 0 __RU_l_____M_________________I___ referenced,uptodate,lru,mmap,readahead
0x0000000000000868 192 0 ___U_lA____M_____________________ uptodate,lru,active,mmap
0x0001000000000868 12 0 ___U_lA____M_________________I___ uptodate,lru,active,mmap,readahead
0x000000000000086c 800 3 __RU_lA____M_____________________ referenced,uptodate,lru,active,mmap
0x000100000000086c 31 0 __RU_lA____M_________________I___ referenced,uptodate,lru,active,mmap,readahead
0x0000000000004878 2 0 ___UDlA____M__b__________________ uptodate,dirty,lru,active,mmap,swapbacked
0x0000000000001000 492 1 ____________a____________________ anonymous
0x0000000000005808 4 0 ___U_______Ma_b__________________ uptodate,mmap,anonymous,swapbacked
0x0000000000005868 2839 11 ___U_lA____Ma_b__________________ uptodate,lru,active,mmap,anonymous,swapbacked
0x000000000000586c 30 0 __RU_lA____Ma_b__________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked
total 513968 2007
# ./page-types -r
flags page-count MB symbolic-flags long-symbolic-flags
0x0000000000000000 468002 1828 _________________________________
0x0000000100000000 19102 74 _____________________r___________ reserved
0x0000000000008000 41 0 _______________H_________________ compound_head
0x0000000000010000 188 0 ________________T________________ compound_tail
0x0000000000008014 1 0 __R_D__________H_________________ referenced,dirty,compound_head
0x0000000000010014 4 0 __R_D___________T________________ referenced,dirty,compound_tail
0x0000000000000020 1 0 _____l___________________________ lru
0x0000000800000024 34 0 __R__l__________________P________ referenced,lru,private
0x0000000000000028 3794 14 ___U_l___________________________ uptodate,lru
0x0001000000000028 46 0 ___U_l_______________________I___ uptodate,lru,readahead
0x0000000400000028 44 0 ___U_l_________________d_________ uptodate,lru,mappedtodisk
0x0001000400000028 2 0 ___U_l_________________d_____I___ uptodate,lru,mappedtodisk,readahead
0x000000000000002c 6434 25 __RU_l___________________________ referenced,uptodate,lru
0x000100000000002c 47 0 __RU_l_______________________I___ referenced,uptodate,lru,readahead
0x000000040000002c 14 0 __RU_l_________________d_________ referenced,uptodate,lru,mappedtodisk
0x000000080000002c 30 0 __RU_l__________________P________ referenced,uptodate,lru,private
0x0000000800000040 8124 31 ______A_________________P________ active,private
0x0000000000000040 219 0 ______A__________________________ active
0x0000000800000060 1 0 _____lA_________________P________ lru,active,private
0x0000000000000068 322 1 ___U_lA__________________________ uptodate,lru,active
0x0001000000000068 12 0 ___U_lA______________________I___ uptodate,lru,active,readahead
0x0000000400000068 13 0 ___U_lA________________d_________ uptodate,lru,active,mappedtodisk
0x0000000800000068 12 0 ___U_lA_________________P________ uptodate,lru,active,private
0x000000000000006c 977 3 __RU_lA__________________________ referenced,uptodate,lru,active
0x000100000000006c 48 0 __RU_lA______________________I___ referenced,uptodate,lru,active,readahead
0x000000040000006c 5 0 __RU_lA________________d_________ referenced,uptodate,lru,active,mappedtodisk
0x000000080000006c 3 0 __RU_lA_________________P________ referenced,uptodate,lru,active,private
0x0000000c0000006c 3 0 __RU_lA________________dP________ referenced,uptodate,lru,active,mappedtodisk,private
0x0000000c00000068 1 0 ___U_lA________________dP________ uptodate,lru,active,mappedtodisk,private
0x0000000000004078 1 0 ___UDlA_______b__________________ uptodate,dirty,lru,active,swapbacked
0x000000000000407c 34 0 __RUDlA_______b__________________ referenced,uptodate,dirty,lru,active,swapbacked
0x0000000000000400 538 2 __________B______________________ buddy
0x0000000000000804 1 0 __R________M_____________________ referenced,mmap
0x0000000000000828 1029 4 ___U_l_____M_____________________ uptodate,lru,mmap
0x0001000000000828 43 0 ___U_l_____M_________________I___ uptodate,lru,mmap,readahead
0x000000000000082c 382 1 __RU_l_____M_____________________ referenced,uptodate,lru,mmap
0x000100000000082c 12 0 __RU_l_____M_________________I___ referenced,uptodate,lru,mmap,readahead
0x0000000000000868 192 0 ___U_lA____M_____________________ uptodate,lru,active,mmap
0x0001000000000868 12 0 ___U_lA____M_________________I___ uptodate,lru,active,mmap,readahead
0x000000000000086c 800 3 __RU_lA____M_____________________ referenced,uptodate,lru,active,mmap
0x000100000000086c 31 0 __RU_lA____M_________________I___ referenced,uptodate,lru,active,mmap,readahead
0x0000000000004878 2 0 ___UDlA____M__b__________________ uptodate,dirty,lru,active,mmap,swapbacked
0x0000000000001000 492 1 ____________a____________________ anonymous
0x0000000000005008 2 0 ___U________a_b__________________ uptodate,anonymous,swapbacked
0x0000000000005808 4 0 ___U_______Ma_b__________________ uptodate,mmap,anonymous,swapbacked
0x000000000000580c 1 0 __RU_______Ma_b__________________ referenced,uptodate,mmap,anonymous,swapbacked
0x0000000000005868 2839 11 ___U_lA____Ma_b__________________ uptodate,lru,active,mmap,anonymous,swapbacked
0x000000000000586c 29 0 __RU_lA____Ma_b__________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked
total 513968 2007
# ./page-types --raw --list --no-summary --bits reserved
offset count flags
0 15 _____________________r___________
31 4 _____________________r___________
159 97 _____________________r___________
4096 2067 _____________________r___________
6752 2390 _____________________r___________
9355 3 _____________________r___________
9728 14526 _____________________r___________
This patch:
Introduce PageHuge(), which identifies huge/gigantic pages by their
dedicated compound destructor functions.
Also move prep_compound_gigantic_page() to hugetlb.c and make
__free_pages_ok() non-static.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-17 06:32:22 +08:00
|
|
|
#include <linux/hugetlb.h>
|
2015-09-10 06:35:38 +08:00
|
|
|
#include <linux/memcontrol.h>
|
mm: introduce idle page tracking
Knowing the portion of memory that is not used by a certain application or
memory cgroup (idle memory) can be useful for partitioning the system
efficiently, e.g. by setting memory cgroup limits appropriately.
Currently, the only means to estimate the amount of idle memory provided
by the kernel is /proc/PID/{clear_refs,smaps}: the user can clear the
access bit for all pages mapped to a particular process by writing 1 to
clear_refs, wait for some time, and then count smaps:Referenced. However,
this method has two serious shortcomings:
- it does not count unmapped file pages
- it affects the reclaimer logic
To overcome these drawbacks, this patch introduces two new page flags,
Idle and Young, and a new sysfs file, /sys/kernel/mm/page_idle/bitmap.
A page's Idle flag can only be set from userspace by setting bit in
/sys/kernel/mm/page_idle/bitmap at the offset corresponding to the page,
and it is cleared whenever the page is accessed either through page tables
(it is cleared in page_referenced() in this case) or using the read(2)
system call (mark_page_accessed()). Thus by setting the Idle flag for
pages of a particular workload, which can be found e.g. by reading
/proc/PID/pagemap, waiting for some time to let the workload access its
working set, and then reading the bitmap file, one can estimate the amount
of pages that are not used by the workload.
The Young page flag is used to avoid interference with the memory
reclaimer. A page's Young flag is set whenever the Access bit of a page
table entry pointing to the page is cleared by writing to the bitmap file.
If page_referenced() is called on a Young page, it will add 1 to its
return value, therefore concealing the fact that the Access bit was
cleared.
Note, since there is no room for extra page flags on 32 bit, this feature
uses extended page flags when compiled on 32 bit.
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: kpageidle requires an MMU]
[akpm@linux-foundation.org: decouple from page-flags rework]
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 06:35:45 +08:00
|
|
|
#include <linux/mmu_notifier.h>
|
|
|
|
#include <linux/page_idle.h>
|
2009-12-16 19:19:59 +08:00
|
|
|
#include <linux/kernel-page-flags.h>
|
2016-12-25 03:46:01 +08:00
|
|
|
#include <linux/uaccess.h>
|
2008-10-06 18:26:12 +08:00
|
|
|
#include "internal.h"
|
|
|
|
|
|
|
|
#define KPMSIZE sizeof(u64)
|
|
|
|
#define KPMMASK (KPMSIZE - 1)
|
mm: introduce idle page tracking
Knowing the portion of memory that is not used by a certain application or
memory cgroup (idle memory) can be useful for partitioning the system
efficiently, e.g. by setting memory cgroup limits appropriately.
Currently, the only means to estimate the amount of idle memory provided
by the kernel is /proc/PID/{clear_refs,smaps}: the user can clear the
access bit for all pages mapped to a particular process by writing 1 to
clear_refs, wait for some time, and then count smaps:Referenced. However,
this method has two serious shortcomings:
- it does not count unmapped file pages
- it affects the reclaimer logic
To overcome these drawbacks, this patch introduces two new page flags,
Idle and Young, and a new sysfs file, /sys/kernel/mm/page_idle/bitmap.
A page's Idle flag can only be set from userspace by setting bit in
/sys/kernel/mm/page_idle/bitmap at the offset corresponding to the page,
and it is cleared whenever the page is accessed either through page tables
(it is cleared in page_referenced() in this case) or using the read(2)
system call (mark_page_accessed()). Thus by setting the Idle flag for
pages of a particular workload, which can be found e.g. by reading
/proc/PID/pagemap, waiting for some time to let the workload access its
working set, and then reading the bitmap file, one can estimate the amount
of pages that are not used by the workload.
The Young page flag is used to avoid interference with the memory
reclaimer. A page's Young flag is set whenever the Access bit of a page
table entry pointing to the page is cleared by writing to the bitmap file.
If page_referenced() is called on a Young page, it will add 1 to its
return value, therefore concealing the fact that the Access bit was
cleared.
Note, since there is no room for extra page flags on 32 bit, this feature
uses extended page flags when compiled on 32 bit.
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: kpageidle requires an MMU]
[akpm@linux-foundation.org: decouple from page-flags rework]
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 06:35:45 +08:00
|
|
|
#define KPMBITS (KPMSIZE * BITS_PER_BYTE)
|
2009-06-17 06:32:23 +08:00
|
|
|
|
fs/proc/page.c: allow inspection of last section and fix end detection
If max_pfn does not fall onto a section boundary, it is possible to
inspect PFNs up to max_pfn, and PFNs above max_pfn, however, max_pfn
itself can't be inspected. We can have a valid (and online) memmap at and
above max_pfn if max_pfn is not aligned to a section boundary. The whole
early section has a memmap and is marked online. Being able to inspect
the state of these PFNs is valuable for debugging, especially because
max_pfn can change on memory hotplug and expose these memmaps.
Also, querying page flags via "./page-types -r -a 0x144001,"
(tools/vm/page-types.c) inside a x86-64 guest with 4160MB under QEMU
results in an (almost) endless loop in user space, because the end is not
detected properly when starting after max_pfn.
Instead, let's allow to inspect all pages in the highest section and
return 0 directly if we try to access pages above that section.
While at it, check the count before adjusting it, to avoid masking user
errors.
Link: http://lkml.kernel.org/r/20191211163201.17179-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 09:33:52 +08:00
|
|
|
static inline unsigned long get_max_dump_pfn(void)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_SPARSEMEM
|
|
|
|
/*
|
|
|
|
* The memmap of early sections is completely populated and marked
|
|
|
|
* online even if max_pfn does not fall on a section boundary -
|
|
|
|
* pfn_to_online_page() will succeed on all pages. Allow inspecting
|
|
|
|
* these memmaps.
|
|
|
|
*/
|
|
|
|
return round_up(max_pfn, PAGES_PER_SECTION);
|
|
|
|
#else
|
|
|
|
return max_pfn;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2008-10-06 18:26:12 +08:00
|
|
|
/* /proc/kpagecount - an array exposing page counts
|
|
|
|
*
|
|
|
|
* Each entry is a u64 representing the corresponding
|
|
|
|
* physical page count.
|
|
|
|
*/
|
|
|
|
static ssize_t kpagecount_read(struct file *file, char __user *buf,
|
|
|
|
size_t count, loff_t *ppos)
|
|
|
|
{
|
fs/proc/page.c: allow inspection of last section and fix end detection
If max_pfn does not fall onto a section boundary, it is possible to
inspect PFNs up to max_pfn, and PFNs above max_pfn, however, max_pfn
itself can't be inspected. We can have a valid (and online) memmap at and
above max_pfn if max_pfn is not aligned to a section boundary. The whole
early section has a memmap and is marked online. Being able to inspect
the state of these PFNs is valuable for debugging, especially because
max_pfn can change on memory hotplug and expose these memmaps.
Also, querying page flags via "./page-types -r -a 0x144001,"
(tools/vm/page-types.c) inside a x86-64 guest with 4160MB under QEMU
results in an (almost) endless loop in user space, because the end is not
detected properly when starting after max_pfn.
Instead, let's allow to inspect all pages in the highest section and
return 0 directly if we try to access pages above that section.
While at it, check the count before adjusting it, to avoid masking user
errors.
Link: http://lkml.kernel.org/r/20191211163201.17179-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 09:33:52 +08:00
|
|
|
const unsigned long max_dump_pfn = get_max_dump_pfn();
|
2008-10-06 18:26:12 +08:00
|
|
|
u64 __user *out = (u64 __user *)buf;
|
|
|
|
struct page *ppage;
|
|
|
|
unsigned long src = *ppos;
|
|
|
|
unsigned long pfn;
|
|
|
|
ssize_t ret = 0;
|
|
|
|
u64 pcount;
|
|
|
|
|
|
|
|
pfn = src / KPMSIZE;
|
|
|
|
if (src & KPMMASK || count & KPMMASK)
|
|
|
|
return -EINVAL;
|
fs/proc/page.c: allow inspection of last section and fix end detection
If max_pfn does not fall onto a section boundary, it is possible to
inspect PFNs up to max_pfn, and PFNs above max_pfn, however, max_pfn
itself can't be inspected. We can have a valid (and online) memmap at and
above max_pfn if max_pfn is not aligned to a section boundary. The whole
early section has a memmap and is marked online. Being able to inspect
the state of these PFNs is valuable for debugging, especially because
max_pfn can change on memory hotplug and expose these memmaps.
Also, querying page flags via "./page-types -r -a 0x144001,"
(tools/vm/page-types.c) inside a x86-64 guest with 4160MB under QEMU
results in an (almost) endless loop in user space, because the end is not
detected properly when starting after max_pfn.
Instead, let's allow to inspect all pages in the highest section and
return 0 directly if we try to access pages above that section.
While at it, check the count before adjusting it, to avoid masking user
errors.
Link: http://lkml.kernel.org/r/20191211163201.17179-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 09:33:52 +08:00
|
|
|
if (src >= max_dump_pfn * KPMSIZE)
|
|
|
|
return 0;
|
|
|
|
count = min_t(unsigned long, count, (max_dump_pfn * KPMSIZE) - src);
|
2008-10-06 18:26:12 +08:00
|
|
|
|
|
|
|
while (count > 0) {
|
2019-10-19 11:19:20 +08:00
|
|
|
/*
|
|
|
|
* TODO: ZONE_DEVICE support requires to identify
|
|
|
|
* memmaps that were actually initialized.
|
|
|
|
*/
|
|
|
|
ppage = pfn_to_online_page(pfn);
|
|
|
|
|
2018-12-28 16:37:31 +08:00
|
|
|
if (!ppage || PageSlab(ppage) || page_has_type(ppage))
|
2008-10-06 18:26:12 +08:00
|
|
|
pcount = 0;
|
|
|
|
else
|
|
|
|
pcount = page_mapcount(ppage);
|
|
|
|
|
2009-06-17 06:32:23 +08:00
|
|
|
if (put_user(pcount, out)) {
|
2008-10-06 18:26:12 +08:00
|
|
|
ret = -EFAULT;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2009-06-17 06:32:23 +08:00
|
|
|
pfn++;
|
|
|
|
out++;
|
2008-10-06 18:26:12 +08:00
|
|
|
count -= KPMSIZE;
|
2015-09-10 06:35:51 +08:00
|
|
|
|
|
|
|
cond_resched();
|
2008-10-06 18:26:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
*ppos += (char __user *)out - buf;
|
|
|
|
if (!ret)
|
|
|
|
ret = (char __user *)out - buf;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2020-02-04 09:37:17 +08:00
|
|
|
static const struct proc_ops kpagecount_proc_ops = {
|
|
|
|
.proc_lseek = mem_lseek,
|
|
|
|
.proc_read = kpagecount_read,
|
2008-10-06 18:26:12 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
/* /proc/kpageflags - an array exposing page flags
|
|
|
|
*
|
|
|
|
* Each entry is a u64 representing the corresponding
|
|
|
|
* physical page flags.
|
|
|
|
*/
|
|
|
|
|
2009-06-17 06:32:24 +08:00
|
|
|
static inline u64 kpf_copy_bit(u64 kflags, int ubit, int kbit)
|
|
|
|
{
|
|
|
|
return ((kflags >> kbit) & 1) << ubit;
|
|
|
|
}
|
|
|
|
|
2009-12-16 19:19:59 +08:00
|
|
|
u64 stable_page_flags(struct page *page)
|
2009-06-17 06:32:24 +08:00
|
|
|
{
|
|
|
|
u64 k;
|
|
|
|
u64 u;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* pseudo flag: KPF_NOPAGE
|
|
|
|
* it differentiates a memory hole from a page with no flags
|
|
|
|
*/
|
|
|
|
if (!page)
|
|
|
|
return 1 << KPF_NOPAGE;
|
|
|
|
|
|
|
|
k = page->flags;
|
|
|
|
u = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* pseudo flags for the well known (anonymous) memory mapped pages
|
|
|
|
*
|
|
|
|
* Note that page->_mapcount is overloaded in SLOB/SLUB/SLQB, so the
|
2016-03-18 05:17:41 +08:00
|
|
|
* simple test in page_mapped() is not enough.
|
2009-06-17 06:32:24 +08:00
|
|
|
*/
|
2016-03-18 05:17:41 +08:00
|
|
|
if (!PageSlab(page) && page_mapped(page))
|
2009-06-17 06:32:24 +08:00
|
|
|
u |= 1 << KPF_MMAP;
|
|
|
|
if (PageAnon(page))
|
|
|
|
u |= 1 << KPF_ANON;
|
2009-09-22 08:02:01 +08:00
|
|
|
if (PageKsm(page))
|
|
|
|
u |= 1 << KPF_KSM;
|
2009-06-17 06:32:24 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* compound pages: export both head/tail info
|
|
|
|
* they together define a compound page's start/end pos and order
|
|
|
|
*/
|
|
|
|
if (PageHead(page))
|
|
|
|
u |= 1 << KPF_COMPOUND_HEAD;
|
|
|
|
if (PageTail(page))
|
|
|
|
u |= 1 << KPF_COMPOUND_TAIL;
|
|
|
|
if (PageHuge(page))
|
|
|
|
u |= 1 << KPF_HUGE;
|
2012-10-09 07:33:47 +08:00
|
|
|
/*
|
|
|
|
* PageTransCompound can be true for non-huge compound pages (slab
|
|
|
|
* pages or pages allocated by drivers with __GFP_COMP) because it
|
2014-01-24 07:52:53 +08:00
|
|
|
* just checks PG_head/PG_tail, so we need to check PageLRU/PageAnon
|
|
|
|
* to make sure a given page is a thp, not a non-huge compound page.
|
2012-10-09 07:33:47 +08:00
|
|
|
*/
|
2015-02-12 07:24:51 +08:00
|
|
|
else if (PageTransCompound(page)) {
|
|
|
|
struct page *head = compound_head(page);
|
|
|
|
|
|
|
|
if (PageLRU(head) || PageAnon(head))
|
|
|
|
u |= 1 << KPF_THP;
|
|
|
|
else if (is_huge_zero_page(head)) {
|
|
|
|
u |= 1 << KPF_ZERO_PAGE;
|
|
|
|
u |= 1 << KPF_THP;
|
|
|
|
}
|
|
|
|
} else if (is_zero_pfn(page_to_pfn(page)))
|
|
|
|
u |= 1 << KPF_ZERO_PAGE;
|
|
|
|
|
2009-06-17 06:32:24 +08:00
|
|
|
|
|
|
|
/*
|
2016-05-20 08:10:49 +08:00
|
|
|
* Caveats on high order pages: page->_refcount will only be set
|
2011-01-14 07:47:00 +08:00
|
|
|
* -1 on the head page; SLUB/SLQB do the same for PG_slab;
|
|
|
|
* SLOB won't set PG_slab at all on compound pages.
|
2009-06-17 06:32:24 +08:00
|
|
|
*/
|
2011-01-14 07:47:00 +08:00
|
|
|
if (PageBuddy(page))
|
|
|
|
u |= 1 << KPF_BUDDY;
|
2016-03-18 05:17:41 +08:00
|
|
|
else if (page_count(page) == 0 && is_free_buddy_page(page))
|
|
|
|
u |= 1 << KPF_BUDDY;
|
2011-01-14 07:47:00 +08:00
|
|
|
|
2019-03-06 07:42:23 +08:00
|
|
|
if (PageOffline(page))
|
|
|
|
u |= 1 << KPF_OFFLINE;
|
2018-06-08 08:08:23 +08:00
|
|
|
if (PageTable(page))
|
|
|
|
u |= 1 << KPF_PGTABLE;
|
2014-10-10 06:29:32 +08:00
|
|
|
|
2015-09-10 06:35:48 +08:00
|
|
|
if (page_is_idle(page))
|
|
|
|
u |= 1 << KPF_IDLE;
|
|
|
|
|
2011-01-14 07:47:00 +08:00
|
|
|
u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
|
|
|
|
|
2009-06-17 06:32:24 +08:00
|
|
|
u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
|
2016-03-18 05:17:44 +08:00
|
|
|
if (PageTail(page) && PageSlab(compound_head(page)))
|
|
|
|
u |= 1 << KPF_SLAB;
|
2009-06-17 06:32:24 +08:00
|
|
|
|
|
|
|
u |= kpf_copy_bit(k, KPF_ERROR, PG_error);
|
|
|
|
u |= kpf_copy_bit(k, KPF_DIRTY, PG_dirty);
|
|
|
|
u |= kpf_copy_bit(k, KPF_UPTODATE, PG_uptodate);
|
|
|
|
u |= kpf_copy_bit(k, KPF_WRITEBACK, PG_writeback);
|
|
|
|
|
|
|
|
u |= kpf_copy_bit(k, KPF_LRU, PG_lru);
|
|
|
|
u |= kpf_copy_bit(k, KPF_REFERENCED, PG_referenced);
|
|
|
|
u |= kpf_copy_bit(k, KPF_ACTIVE, PG_active);
|
|
|
|
u |= kpf_copy_bit(k, KPF_RECLAIM, PG_reclaim);
|
|
|
|
|
2017-02-08 03:11:16 +08:00
|
|
|
if (PageSwapCache(page))
|
|
|
|
u |= 1 << KPF_SWAPCACHE;
|
2009-06-17 06:32:24 +08:00
|
|
|
u |= kpf_copy_bit(k, KPF_SWAPBACKED, PG_swapbacked);
|
|
|
|
|
|
|
|
u |= kpf_copy_bit(k, KPF_UNEVICTABLE, PG_unevictable);
|
|
|
|
u |= kpf_copy_bit(k, KPF_MLOCKED, PG_mlocked);
|
|
|
|
|
2009-10-08 07:32:27 +08:00
|
|
|
#ifdef CONFIG_MEMORY_FAILURE
|
|
|
|
u |= kpf_copy_bit(k, KPF_HWPOISON, PG_hwpoison);
|
|
|
|
#endif
|
|
|
|
|
2010-09-10 07:37:36 +08:00
|
|
|
#ifdef CONFIG_ARCH_USES_PG_UNCACHED
|
2009-06-17 06:32:24 +08:00
|
|
|
u |= kpf_copy_bit(k, KPF_UNCACHED, PG_uncached);
|
|
|
|
#endif
|
|
|
|
|
|
|
|
u |= kpf_copy_bit(k, KPF_RESERVED, PG_reserved);
|
|
|
|
u |= kpf_copy_bit(k, KPF_MAPPEDTODISK, PG_mappedtodisk);
|
|
|
|
u |= kpf_copy_bit(k, KPF_PRIVATE, PG_private);
|
|
|
|
u |= kpf_copy_bit(k, KPF_PRIVATE_2, PG_private_2);
|
|
|
|
u |= kpf_copy_bit(k, KPF_OWNER_PRIVATE, PG_owner_priv_1);
|
|
|
|
u |= kpf_copy_bit(k, KPF_ARCH, PG_arch_1);
|
2020-04-22 22:25:27 +08:00
|
|
|
#ifdef CONFIG_64BIT
|
|
|
|
u |= kpf_copy_bit(k, KPF_ARCH_2, PG_arch_2);
|
|
|
|
#endif
|
2009-06-17 06:32:24 +08:00
|
|
|
|
|
|
|
return u;
|
|
|
|
};
|
2008-10-06 18:26:12 +08:00
|
|
|
|
|
|
|
static ssize_t kpageflags_read(struct file *file, char __user *buf,
|
|
|
|
size_t count, loff_t *ppos)
|
|
|
|
{
|
fs/proc/page.c: allow inspection of last section and fix end detection
If max_pfn does not fall onto a section boundary, it is possible to
inspect PFNs up to max_pfn, and PFNs above max_pfn, however, max_pfn
itself can't be inspected. We can have a valid (and online) memmap at and
above max_pfn if max_pfn is not aligned to a section boundary. The whole
early section has a memmap and is marked online. Being able to inspect
the state of these PFNs is valuable for debugging, especially because
max_pfn can change on memory hotplug and expose these memmaps.
Also, querying page flags via "./page-types -r -a 0x144001,"
(tools/vm/page-types.c) inside a x86-64 guest with 4160MB under QEMU
results in an (almost) endless loop in user space, because the end is not
detected properly when starting after max_pfn.
Instead, let's allow to inspect all pages in the highest section and
return 0 directly if we try to access pages above that section.
While at it, check the count before adjusting it, to avoid masking user
errors.
Link: http://lkml.kernel.org/r/20191211163201.17179-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 09:33:52 +08:00
|
|
|
const unsigned long max_dump_pfn = get_max_dump_pfn();
|
2008-10-06 18:26:12 +08:00
|
|
|
u64 __user *out = (u64 __user *)buf;
|
|
|
|
struct page *ppage;
|
|
|
|
unsigned long src = *ppos;
|
|
|
|
unsigned long pfn;
|
|
|
|
ssize_t ret = 0;
|
|
|
|
|
|
|
|
pfn = src / KPMSIZE;
|
|
|
|
if (src & KPMMASK || count & KPMMASK)
|
|
|
|
return -EINVAL;
|
fs/proc/page.c: allow inspection of last section and fix end detection
If max_pfn does not fall onto a section boundary, it is possible to
inspect PFNs up to max_pfn, and PFNs above max_pfn, however, max_pfn
itself can't be inspected. We can have a valid (and online) memmap at and
above max_pfn if max_pfn is not aligned to a section boundary. The whole
early section has a memmap and is marked online. Being able to inspect
the state of these PFNs is valuable for debugging, especially because
max_pfn can change on memory hotplug and expose these memmaps.
Also, querying page flags via "./page-types -r -a 0x144001,"
(tools/vm/page-types.c) inside a x86-64 guest with 4160MB under QEMU
results in an (almost) endless loop in user space, because the end is not
detected properly when starting after max_pfn.
Instead, let's allow to inspect all pages in the highest section and
return 0 directly if we try to access pages above that section.
While at it, check the count before adjusting it, to avoid masking user
errors.
Link: http://lkml.kernel.org/r/20191211163201.17179-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 09:33:52 +08:00
|
|
|
if (src >= max_dump_pfn * KPMSIZE)
|
|
|
|
return 0;
|
|
|
|
count = min_t(unsigned long, count, (max_dump_pfn * KPMSIZE) - src);
|
2008-10-06 18:26:12 +08:00
|
|
|
|
|
|
|
while (count > 0) {
|
2019-10-19 11:19:20 +08:00
|
|
|
/*
|
|
|
|
* TODO: ZONE_DEVICE support requires to identify
|
|
|
|
* memmaps that were actually initialized.
|
|
|
|
*/
|
|
|
|
ppage = pfn_to_online_page(pfn);
|
2009-06-17 06:32:24 +08:00
|
|
|
|
2009-12-16 19:19:59 +08:00
|
|
|
if (put_user(stable_page_flags(ppage), out)) {
|
2008-10-06 18:26:12 +08:00
|
|
|
ret = -EFAULT;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2009-06-17 06:32:23 +08:00
|
|
|
pfn++;
|
|
|
|
out++;
|
2008-10-06 18:26:12 +08:00
|
|
|
count -= KPMSIZE;
|
2015-09-10 06:35:51 +08:00
|
|
|
|
|
|
|
cond_resched();
|
2008-10-06 18:26:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
*ppos += (char __user *)out - buf;
|
|
|
|
if (!ret)
|
|
|
|
ret = (char __user *)out - buf;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2020-02-04 09:37:17 +08:00
|
|
|
static const struct proc_ops kpageflags_proc_ops = {
|
|
|
|
.proc_lseek = mem_lseek,
|
|
|
|
.proc_read = kpageflags_read,
|
2008-10-06 18:26:12 +08:00
|
|
|
};
|
|
|
|
|
2015-09-10 06:35:38 +08:00
|
|
|
#ifdef CONFIG_MEMCG
|
|
|
|
static ssize_t kpagecgroup_read(struct file *file, char __user *buf,
|
|
|
|
size_t count, loff_t *ppos)
|
|
|
|
{
|
fs/proc/page.c: allow inspection of last section and fix end detection
If max_pfn does not fall onto a section boundary, it is possible to
inspect PFNs up to max_pfn, and PFNs above max_pfn, however, max_pfn
itself can't be inspected. We can have a valid (and online) memmap at and
above max_pfn if max_pfn is not aligned to a section boundary. The whole
early section has a memmap and is marked online. Being able to inspect
the state of these PFNs is valuable for debugging, especially because
max_pfn can change on memory hotplug and expose these memmaps.
Also, querying page flags via "./page-types -r -a 0x144001,"
(tools/vm/page-types.c) inside a x86-64 guest with 4160MB under QEMU
results in an (almost) endless loop in user space, because the end is not
detected properly when starting after max_pfn.
Instead, let's allow to inspect all pages in the highest section and
return 0 directly if we try to access pages above that section.
While at it, check the count before adjusting it, to avoid masking user
errors.
Link: http://lkml.kernel.org/r/20191211163201.17179-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 09:33:52 +08:00
|
|
|
const unsigned long max_dump_pfn = get_max_dump_pfn();
|
2015-09-10 06:35:38 +08:00
|
|
|
u64 __user *out = (u64 __user *)buf;
|
|
|
|
struct page *ppage;
|
|
|
|
unsigned long src = *ppos;
|
|
|
|
unsigned long pfn;
|
|
|
|
ssize_t ret = 0;
|
|
|
|
u64 ino;
|
|
|
|
|
|
|
|
pfn = src / KPMSIZE;
|
|
|
|
if (src & KPMMASK || count & KPMMASK)
|
|
|
|
return -EINVAL;
|
fs/proc/page.c: allow inspection of last section and fix end detection
If max_pfn does not fall onto a section boundary, it is possible to
inspect PFNs up to max_pfn, and PFNs above max_pfn, however, max_pfn
itself can't be inspected. We can have a valid (and online) memmap at and
above max_pfn if max_pfn is not aligned to a section boundary. The whole
early section has a memmap and is marked online. Being able to inspect
the state of these PFNs is valuable for debugging, especially because
max_pfn can change on memory hotplug and expose these memmaps.
Also, querying page flags via "./page-types -r -a 0x144001,"
(tools/vm/page-types.c) inside a x86-64 guest with 4160MB under QEMU
results in an (almost) endless loop in user space, because the end is not
detected properly when starting after max_pfn.
Instead, let's allow to inspect all pages in the highest section and
return 0 directly if we try to access pages above that section.
While at it, check the count before adjusting it, to avoid masking user
errors.
Link: http://lkml.kernel.org/r/20191211163201.17179-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 09:33:52 +08:00
|
|
|
if (src >= max_dump_pfn * KPMSIZE)
|
|
|
|
return 0;
|
|
|
|
count = min_t(unsigned long, count, (max_dump_pfn * KPMSIZE) - src);
|
2015-09-10 06:35:38 +08:00
|
|
|
|
|
|
|
while (count > 0) {
|
2019-10-19 11:19:20 +08:00
|
|
|
/*
|
|
|
|
* TODO: ZONE_DEVICE support requires to identify
|
|
|
|
* memmaps that were actually initialized.
|
|
|
|
*/
|
|
|
|
ppage = pfn_to_online_page(pfn);
|
2015-09-10 06:35:38 +08:00
|
|
|
|
|
|
|
if (ppage)
|
|
|
|
ino = page_cgroup_ino(ppage);
|
|
|
|
else
|
|
|
|
ino = 0;
|
|
|
|
|
|
|
|
if (put_user(ino, out)) {
|
|
|
|
ret = -EFAULT;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
pfn++;
|
|
|
|
out++;
|
|
|
|
count -= KPMSIZE;
|
2015-09-10 06:35:51 +08:00
|
|
|
|
|
|
|
cond_resched();
|
2015-09-10 06:35:38 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
*ppos += (char __user *)out - buf;
|
|
|
|
if (!ret)
|
|
|
|
ret = (char __user *)out - buf;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2020-02-04 09:37:17 +08:00
|
|
|
static const struct proc_ops kpagecgroup_proc_ops = {
|
|
|
|
.proc_lseek = mem_lseek,
|
|
|
|
.proc_read = kpagecgroup_read,
|
2015-09-10 06:35:38 +08:00
|
|
|
};
|
|
|
|
#endif /* CONFIG_MEMCG */
|
|
|
|
|
2008-10-06 18:26:12 +08:00
|
|
|
static int __init proc_page_init(void)
|
|
|
|
{
|
2020-02-04 09:37:17 +08:00
|
|
|
proc_create("kpagecount", S_IRUSR, NULL, &kpagecount_proc_ops);
|
|
|
|
proc_create("kpageflags", S_IRUSR, NULL, &kpageflags_proc_ops);
|
2015-09-10 06:35:38 +08:00
|
|
|
#ifdef CONFIG_MEMCG
|
2020-02-04 09:37:17 +08:00
|
|
|
proc_create("kpagecgroup", S_IRUSR, NULL, &kpagecgroup_proc_ops);
|
2015-09-10 06:35:38 +08:00
|
|
|
#endif
|
2008-10-06 18:26:12 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2014-01-24 07:55:45 +08:00
|
|
|
fs_initcall(proc_page_init);
|