Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton: - a few misc bits - ocfs2 - most(?) of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits) thp: fix comments of __pmd_trans_huge_lock() cgroup: remove unnecessary 0 check from css_from_id() cgroup: fix idr leak for the first cgroup root mm: memcontrol: fix documentation for compound parameter mm: memcontrol: remove BUG_ON in uncharge_list mm: fix build warnings in <linux/compaction.h> mm, thp: convert from optimistic swapin collapsing to conservative mm, thp: fix comment inconsistency for swapin readahead functions thp: update Documentation/{vm/transhuge,filesystems/proc}.txt shmem: split huge pages beyond i_size under memory pressure thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE khugepaged: add support of collapse for tmpfs/shmem pages shmem: make shmem_inode_info::lock irq-safe khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page() thp: extract khugepaged from mm/huge_memory.c shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings shmem: add huge pages support shmem: get_unmapped_area align huge page shmem: prepare huge= mount option and sysfs knob mm, rmap: account shmem thp pages ...
This commit is contained in:
commit
0e06f5c0de
|
@ -59,23 +59,23 @@ num_devices parameter is optional and tells zram how many devices should be
|
|||
pre-created. Default: 1.
|
||||
|
||||
2) Set max number of compression streams
|
||||
Regardless the value passed to this attribute, ZRAM will always
|
||||
allocate multiple compression streams - one per online CPUs - thus
|
||||
allowing several concurrent compression operations. The number of
|
||||
allocated compression streams goes down when some of the CPUs
|
||||
become offline. There is no single-compression-stream mode anymore,
|
||||
unless you are running a UP system or has only 1 CPU online.
|
||||
Regardless the value passed to this attribute, ZRAM will always
|
||||
allocate multiple compression streams - one per online CPUs - thus
|
||||
allowing several concurrent compression operations. The number of
|
||||
allocated compression streams goes down when some of the CPUs
|
||||
become offline. There is no single-compression-stream mode anymore,
|
||||
unless you are running a UP system or has only 1 CPU online.
|
||||
|
||||
To find out how many streams are currently available:
|
||||
To find out how many streams are currently available:
|
||||
cat /sys/block/zram0/max_comp_streams
|
||||
|
||||
3) Select compression algorithm
|
||||
Using comp_algorithm device attribute one can see available and
|
||||
currently selected (shown in square brackets) compression algorithms,
|
||||
change selected compression algorithm (once the device is initialised
|
||||
there is no way to change compression algorithm).
|
||||
Using comp_algorithm device attribute one can see available and
|
||||
currently selected (shown in square brackets) compression algorithms,
|
||||
change selected compression algorithm (once the device is initialised
|
||||
there is no way to change compression algorithm).
|
||||
|
||||
Examples:
|
||||
Examples:
|
||||
#show supported compression algorithms
|
||||
cat /sys/block/zram0/comp_algorithm
|
||||
lzo [lz4]
|
||||
|
@ -83,17 +83,27 @@ pre-created. Default: 1.
|
|||
#select lzo compression algorithm
|
||||
echo lzo > /sys/block/zram0/comp_algorithm
|
||||
|
||||
4) Set Disksize
|
||||
Set disk size by writing the value to sysfs node 'disksize'.
|
||||
The value can be either in bytes or you can use mem suffixes.
|
||||
Examples:
|
||||
# Initialize /dev/zram0 with 50MB disksize
|
||||
echo $((50*1024*1024)) > /sys/block/zram0/disksize
|
||||
For the time being, the `comp_algorithm' content does not necessarily
|
||||
show every compression algorithm supported by the kernel. We keep this
|
||||
list primarily to simplify device configuration and one can configure
|
||||
a new device with a compression algorithm that is not listed in
|
||||
`comp_algorithm'. The thing is that, internally, ZRAM uses Crypto API
|
||||
and, if some of the algorithms were built as modules, it's impossible
|
||||
to list all of them using, for instance, /proc/crypto or any other
|
||||
method. This, however, has an advantage of permitting the usage of
|
||||
custom crypto compression modules (implementing S/W or H/W compression).
|
||||
|
||||
# Using mem suffixes
|
||||
echo 256K > /sys/block/zram0/disksize
|
||||
echo 512M > /sys/block/zram0/disksize
|
||||
echo 1G > /sys/block/zram0/disksize
|
||||
4) Set Disksize
|
||||
Set disk size by writing the value to sysfs node 'disksize'.
|
||||
The value can be either in bytes or you can use mem suffixes.
|
||||
Examples:
|
||||
# Initialize /dev/zram0 with 50MB disksize
|
||||
echo $((50*1024*1024)) > /sys/block/zram0/disksize
|
||||
|
||||
# Using mem suffixes
|
||||
echo 256K > /sys/block/zram0/disksize
|
||||
echo 512M > /sys/block/zram0/disksize
|
||||
echo 1G > /sys/block/zram0/disksize
|
||||
|
||||
Note:
|
||||
There is little point creating a zram of greater than twice the size of memory
|
||||
|
@ -101,20 +111,20 @@ since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
|
|||
size of the disk when not in use so a huge zram is wasteful.
|
||||
|
||||
5) Set memory limit: Optional
|
||||
Set memory limit by writing the value to sysfs node 'mem_limit'.
|
||||
The value can be either in bytes or you can use mem suffixes.
|
||||
In addition, you could change the value in runtime.
|
||||
Examples:
|
||||
# limit /dev/zram0 with 50MB memory
|
||||
echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
|
||||
Set memory limit by writing the value to sysfs node 'mem_limit'.
|
||||
The value can be either in bytes or you can use mem suffixes.
|
||||
In addition, you could change the value in runtime.
|
||||
Examples:
|
||||
# limit /dev/zram0 with 50MB memory
|
||||
echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
|
||||
|
||||
# Using mem suffixes
|
||||
echo 256K > /sys/block/zram0/mem_limit
|
||||
echo 512M > /sys/block/zram0/mem_limit
|
||||
echo 1G > /sys/block/zram0/mem_limit
|
||||
# Using mem suffixes
|
||||
echo 256K > /sys/block/zram0/mem_limit
|
||||
echo 512M > /sys/block/zram0/mem_limit
|
||||
echo 1G > /sys/block/zram0/mem_limit
|
||||
|
||||
# To disable memory limit
|
||||
echo 0 > /sys/block/zram0/mem_limit
|
||||
# To disable memory limit
|
||||
echo 0 > /sys/block/zram0/mem_limit
|
||||
|
||||
6) Activate:
|
||||
mkswap /dev/zram0
|
||||
|
|
|
@ -195,7 +195,9 @@ prototypes:
|
|||
int (*releasepage) (struct page *, int);
|
||||
void (*freepage)(struct page *);
|
||||
int (*direct_IO)(struct kiocb *, struct iov_iter *iter);
|
||||
bool (*isolate_page) (struct page *, isolate_mode_t);
|
||||
int (*migratepage)(struct address_space *, struct page *, struct page *);
|
||||
void (*putback_page) (struct page *);
|
||||
int (*launder_page)(struct page *);
|
||||
int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
|
||||
int (*error_remove_page)(struct address_space *, struct page *);
|
||||
|
@ -219,7 +221,9 @@ invalidatepage: yes
|
|||
releasepage: yes
|
||||
freepage: yes
|
||||
direct_IO:
|
||||
isolate_page: yes
|
||||
migratepage: yes (both)
|
||||
putback_page: yes
|
||||
launder_page: yes
|
||||
is_partially_uptodate: yes
|
||||
error_remove_page: yes
|
||||
|
@ -544,13 +548,13 @@ subsequent truncate), and then return with VM_FAULT_LOCKED, and the page
|
|||
locked. The VM will unlock the page.
|
||||
|
||||
->map_pages() is called when VM asks to map easy accessible pages.
|
||||
Filesystem should find and map pages associated with offsets from "pgoff"
|
||||
till "max_pgoff". ->map_pages() is called with page table locked and must
|
||||
Filesystem should find and map pages associated with offsets from "start_pgoff"
|
||||
till "end_pgoff". ->map_pages() is called with page table locked and must
|
||||
not block. If it's not possible to reach a page without blocking,
|
||||
filesystem should skip it. Filesystem should use do_set_pte() to setup
|
||||
page table entry. Pointer to entry associated with offset "pgoff" is
|
||||
passed in "pte" field in vm_fault structure. Pointers to entries for other
|
||||
offsets should be calculated relative to "pte".
|
||||
page table entry. Pointer to entry associated with the page is passed in
|
||||
"pte" field in fault_env structure. Pointers to entries for other offsets
|
||||
should be calculated relative to "pte".
|
||||
|
||||
->page_mkwrite() is called when a previously read-only pte is
|
||||
about to become writeable. The filesystem again must ensure that there are
|
||||
|
|
|
@ -49,6 +49,7 @@ These block devices may be used for inspiration:
|
|||
- axonram: Axon DDR2 device driver
|
||||
- brd: RAM backed block device driver
|
||||
- dcssblk: s390 dcss block device driver
|
||||
- pmem: NVDIMM persistent memory driver
|
||||
|
||||
|
||||
Implementation Tips for Filesystem Writers
|
||||
|
@ -75,8 +76,9 @@ calls to get_block() (for example by a page-fault racing with a read()
|
|||
or a write()) work correctly.
|
||||
|
||||
These filesystems may be used for inspiration:
|
||||
- ext2: the second extended filesystem, see Documentation/filesystems/ext2.txt
|
||||
- ext4: the fourth extended filesystem, see Documentation/filesystems/ext4.txt
|
||||
- ext2: see Documentation/filesystems/ext2.txt
|
||||
- ext4: see Documentation/filesystems/ext4.txt
|
||||
- xfs: see Documentation/filesystems/xfs.txt
|
||||
|
||||
|
||||
Handling Media Errors
|
||||
|
|
|
@ -436,6 +436,7 @@ Private_Dirty: 0 kB
|
|||
Referenced: 892 kB
|
||||
Anonymous: 0 kB
|
||||
AnonHugePages: 0 kB
|
||||
ShmemPmdMapped: 0 kB
|
||||
Shared_Hugetlb: 0 kB
|
||||
Private_Hugetlb: 0 kB
|
||||
Swap: 0 kB
|
||||
|
@ -464,6 +465,8 @@ accessed.
|
|||
a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
|
||||
and a page is modified, the file page is replaced by a private anonymous copy.
|
||||
"AnonHugePages" shows the ammount of memory backed by transparent hugepage.
|
||||
"ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by
|
||||
huge pages.
|
||||
"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
|
||||
hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
|
||||
reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field.
|
||||
|
@ -868,6 +871,9 @@ VmallocTotal: 112216 kB
|
|||
VmallocUsed: 428 kB
|
||||
VmallocChunk: 111088 kB
|
||||
AnonHugePages: 49152 kB
|
||||
ShmemHugePages: 0 kB
|
||||
ShmemPmdMapped: 0 kB
|
||||
|
||||
|
||||
MemTotal: Total usable ram (i.e. physical ram minus a few reserved
|
||||
bits and the kernel binary code)
|
||||
|
@ -912,6 +918,9 @@ MemAvailable: An estimate of how much memory is available for starting new
|
|||
AnonHugePages: Non-file backed huge pages mapped into userspace page tables
|
||||
Mapped: files which have been mmaped, such as libraries
|
||||
Shmem: Total memory used by shared memory (shmem) and tmpfs
|
||||
ShmemHugePages: Memory used by shared memory (shmem) and tmpfs allocated
|
||||
with huge pages
|
||||
ShmemPmdMapped: Shared memory mapped into userspace with huge pages
|
||||
Slab: in-kernel data structures cache
|
||||
SReclaimable: Part of Slab, that might be reclaimed, such as caches
|
||||
SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure
|
||||
|
|
|
@ -592,9 +592,14 @@ struct address_space_operations {
|
|||
int (*releasepage) (struct page *, int);
|
||||
void (*freepage)(struct page *);
|
||||
ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
|
||||
/* isolate a page for migration */
|
||||
bool (*isolate_page) (struct page *, isolate_mode_t);
|
||||
/* migrate the contents of a page to the specified target */
|
||||
int (*migratepage) (struct page *, struct page *);
|
||||
/* put migration-failed page back to right list */
|
||||
void (*putback_page) (struct page *);
|
||||
int (*launder_page) (struct page *);
|
||||
|
||||
int (*is_partially_uptodate) (struct page *, unsigned long,
|
||||
unsigned long);
|
||||
void (*is_dirty_writeback) (struct page *, bool *, bool *);
|
||||
|
@ -747,6 +752,10 @@ struct address_space_operations {
|
|||
and transfer data directly between the storage and the
|
||||
application's address space.
|
||||
|
||||
isolate_page: Called by the VM when isolating a movable non-lru page.
|
||||
If page is successfully isolated, VM marks the page as PG_isolated
|
||||
via __SetPageIsolated.
|
||||
|
||||
migrate_page: This is used to compact the physical memory usage.
|
||||
If the VM wants to relocate a page (maybe off a memory card
|
||||
that is signalling imminent failure) it will pass a new page
|
||||
|
@ -754,6 +763,8 @@ struct address_space_operations {
|
|||
transfer any private data across and update any references
|
||||
that it has to the page.
|
||||
|
||||
putback_page: Called by the VM when isolated page's migration fails.
|
||||
|
||||
launder_page: Called before freeing a page - it writes back the dirty page. To
|
||||
prevent redirtying the page, it is kept locked during the whole
|
||||
operation.
|
||||
|
|
|
@ -142,5 +142,111 @@ Steps:
|
|||
20. The new page is moved to the LRU and can be scanned by the swapper
|
||||
etc again.
|
||||
|
||||
Christoph Lameter, May 8, 2006.
|
||||
C. Non-LRU page migration
|
||||
-------------------------
|
||||
|
||||
Although original migration aimed for reducing the latency of memory access
|
||||
for NUMA, compaction who want to create high-order page is also main customer.
|
||||
|
||||
Current problem of the implementation is that it is designed to migrate only
|
||||
*LRU* pages. However, there are potential non-lru pages which can be migrated
|
||||
in drivers, for example, zsmalloc, virtio-balloon pages.
|
||||
|
||||
For virtio-balloon pages, some parts of migration code path have been hooked
|
||||
up and added virtio-balloon specific functions to intercept migration logics.
|
||||
It's too specific to a driver so other drivers who want to make their pages
|
||||
movable would have to add own specific hooks in migration path.
|
||||
|
||||
To overclome the problem, VM supports non-LRU page migration which provides
|
||||
generic functions for non-LRU movable pages without driver specific hooks
|
||||
migration path.
|
||||
|
||||
If a driver want to make own pages movable, it should define three functions
|
||||
which are function pointers of struct address_space_operations.
|
||||
|
||||
1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
|
||||
|
||||
What VM expects on isolate_page function of driver is to return *true*
|
||||
if driver isolates page successfully. On returing true, VM marks the page
|
||||
as PG_isolated so concurrent isolation in several CPUs skip the page
|
||||
for isolation. If a driver cannot isolate the page, it should return *false*.
|
||||
|
||||
Once page is successfully isolated, VM uses page.lru fields so driver
|
||||
shouldn't expect to preserve values in that fields.
|
||||
|
||||
2. int (*migratepage) (struct address_space *mapping,
|
||||
struct page *newpage, struct page *oldpage, enum migrate_mode);
|
||||
|
||||
After isolation, VM calls migratepage of driver with isolated page.
|
||||
The function of migratepage is to move content of the old page to new page
|
||||
and set up fields of struct page newpage. Keep in mind that you should
|
||||
indicate to the VM the oldpage is no longer movable via __ClearPageMovable()
|
||||
under page_lock if you migrated the oldpage successfully and returns
|
||||
MIGRATEPAGE_SUCCESS. If driver cannot migrate the page at the moment, driver
|
||||
can return -EAGAIN. On -EAGAIN, VM will retry page migration in a short time
|
||||
because VM interprets -EAGAIN as "temporal migration failure". On returning
|
||||
any error except -EAGAIN, VM will give up the page migration without retrying
|
||||
in this time.
|
||||
|
||||
Driver shouldn't touch page.lru field VM using in the functions.
|
||||
|
||||
3. void (*putback_page)(struct page *);
|
||||
|
||||
If migration fails on isolated page, VM should return the isolated page
|
||||
to the driver so VM calls driver's putback_page with migration failed page.
|
||||
In this function, driver should put the isolated page back to the own data
|
||||
structure.
|
||||
|
||||
4. non-lru movable page flags
|
||||
|
||||
There are two page flags for supporting non-lru movable page.
|
||||
|
||||
* PG_movable
|
||||
|
||||
Driver should use the below function to make page movable under page_lock.
|
||||
|
||||
void __SetPageMovable(struct page *page, struct address_space *mapping)
|
||||
|
||||
It needs argument of address_space for registering migration family functions
|
||||
which will be called by VM. Exactly speaking, PG_movable is not a real flag of
|
||||
struct page. Rather than, VM reuses page->mapping's lower bits to represent it.
|
||||
|
||||
#define PAGE_MAPPING_MOVABLE 0x2
|
||||
page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
|
||||
|
||||
so driver shouldn't access page->mapping directly. Instead, driver should
|
||||
use page_mapping which mask off the low two bits of page->mapping under
|
||||
page lock so it can get right struct address_space.
|
||||
|
||||
For testing of non-lru movable page, VM supports __PageMovable function.
|
||||
However, it doesn't guarantee to identify non-lru movable page because
|
||||
page->mapping field is unified with other variables in struct page.
|
||||
As well, if driver releases the page after isolation by VM, page->mapping
|
||||
doesn't have stable value although it has PAGE_MAPPING_MOVABLE
|
||||
(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
|
||||
page is LRU or non-lru movable once the page has been isolated. Because
|
||||
LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
|
||||
good for just peeking to test non-lru movable pages before more expensive
|
||||
checking with lock_page in pfn scanning to select victim.
|
||||
|
||||
For guaranteeing non-lru movable page, VM provides PageMovable function.
|
||||
Unlike __PageMovable, PageMovable functions validates page->mapping and
|
||||
mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
|
||||
destroying of page->mapping.
|
||||
|
||||
Driver using __SetPageMovable should clear the flag via __ClearMovablePage
|
||||
under page_lock before the releasing the page.
|
||||
|
||||
* PG_isolated
|
||||
|
||||
To prevent concurrent isolation among several CPUs, VM marks isolated page
|
||||
as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
|
||||
movable page, it can skip it. Driver doesn't need to manipulate the flag
|
||||
because VM will set/clear it automatically. Keep in mind that if driver
|
||||
sees PG_isolated page, it means the page have been isolated by VM so it
|
||||
shouldn't touch page.lru field.
|
||||
PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
|
||||
for own purpose.
|
||||
|
||||
Christoph Lameter, May 8, 2006.
|
||||
Minchan Kim, Mar 28, 2016.
|
||||
|
|
|
@ -9,8 +9,8 @@ using huge pages for the backing of virtual memory with huge pages
|
|||
that supports the automatic promotion and demotion of page sizes and
|
||||
without the shortcomings of hugetlbfs.
|
||||
|
||||
Currently it only works for anonymous memory mappings but in the
|
||||
future it can expand over the pagecache layer starting with tmpfs.
|
||||
Currently it only works for anonymous memory mappings and tmpfs/shmem.
|
||||
But in the future it can expand to other filesystems.
|
||||
|
||||
The reason applications are running faster is because of two
|
||||
factors. The first factor is almost completely irrelevant and it's not
|
||||
|
@ -57,10 +57,6 @@ miss is going to run faster.
|
|||
feature that applies to all dynamic high order allocations in the
|
||||
kernel)
|
||||
|
||||
- this initial support only offers the feature in the anonymous memory
|
||||
regions but it'd be ideal to move it to tmpfs and the pagecache
|
||||
later
|
||||
|
||||
Transparent Hugepage Support maximizes the usefulness of free memory
|
||||
if compared to the reservation approach of hugetlbfs by allowing all
|
||||
unused memory to be used as cache or other movable (or even unmovable
|
||||
|
@ -94,21 +90,21 @@ madvise(MADV_HUGEPAGE) on their critical mmapped regions.
|
|||
|
||||
== sysfs ==
|
||||
|
||||
Transparent Hugepage Support can be entirely disabled (mostly for
|
||||
debugging purposes) or only enabled inside MADV_HUGEPAGE regions (to
|
||||
avoid the risk of consuming more memory resources) or enabled system
|
||||
wide. This can be achieved with one of:
|
||||
Transparent Hugepage Support for anonymous memory can be entirely disabled
|
||||
(mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE
|
||||
regions (to avoid the risk of consuming more memory resources) or enabled
|
||||
system wide. This can be achieved with one of:
|
||||
|
||||
echo always >/sys/kernel/mm/transparent_hugepage/enabled
|
||||
echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
|
||||
echo never >/sys/kernel/mm/transparent_hugepage/enabled
|
||||
|
||||
It's also possible to limit defrag efforts in the VM to generate
|
||||
hugepages in case they're not immediately free to madvise regions or
|
||||
to never try to defrag memory and simply fallback to regular pages
|
||||
unless hugepages are immediately available. Clearly if we spend CPU
|
||||
time to defrag memory, we would expect to gain even more by the fact
|
||||
we use hugepages later instead of regular pages. This isn't always
|
||||
anonymous hugepages in case they're not immediately free to madvise
|
||||
regions or to never try to defrag memory and simply fallback to regular
|
||||
pages unless hugepages are immediately available. Clearly if we spend CPU
|
||||
time to defrag memory, we would expect to gain even more by the fact we
|
||||
use hugepages later instead of regular pages. This isn't always
|
||||
guaranteed, but it may be more likely in case the allocation is for a
|
||||
MADV_HUGEPAGE region.
|
||||
|
||||
|
@ -133,9 +129,9 @@ that are have used madvise(MADV_HUGEPAGE). This is the default behaviour.
|
|||
|
||||
"never" should be self-explanatory.
|
||||
|
||||
By default kernel tries to use huge zero page on read page fault.
|
||||
It's possible to disable huge zero page by writing 0 or enable it
|
||||
back by writing 1:
|
||||
By default kernel tries to use huge zero page on read page fault to
|
||||
anonymous mapping. It's possible to disable huge zero page by writing 0
|
||||
or enable it back by writing 1:
|
||||
|
||||
echo 0 >/sys/kernel/mm/transparent_hugepage/use_zero_page
|
||||
echo 1 >/sys/kernel/mm/transparent_hugepage/use_zero_page
|
||||
|
@ -204,21 +200,67 @@ Support by passing the parameter "transparent_hugepage=always" or
|
|||
"transparent_hugepage=madvise" or "transparent_hugepage=never"
|
||||
(without "") to the kernel command line.
|
||||
|
||||
== Hugepages in tmpfs/shmem ==
|
||||
|
||||
You can control hugepage allocation policy in tmpfs with mount option
|
||||
"huge=". It can have following values:
|
||||
|
||||
- "always":
|
||||
Attempt to allocate huge pages every time we need a new page;
|
||||
|
||||
- "never":
|
||||
Do not allocate huge pages;
|
||||
|
||||
- "within_size":
|
||||
Only allocate huge page if it will be fully within i_size.
|
||||
Also respect fadvise()/madvise() hints;
|
||||
|
||||
- "advise:
|
||||
Only allocate huge pages if requested with fadvise()/madvise();
|
||||
|
||||
The default policy is "never".
|
||||
|
||||
"mount -o remount,huge= /mountpoint" works fine after mount: remounting
|
||||
huge=never will not attempt to break up huge pages at all, just stop more
|
||||
from being allocated.
|
||||
|
||||
There's also sysfs knob to control hugepage allocation policy for internal
|
||||
shmem mount: /sys/kernel/mm/transparent_hugepage/shmem_enabled. The mount
|
||||
is used for SysV SHM, memfds, shared anonymous mmaps (of /dev/zero or
|
||||
MAP_ANONYMOUS), GPU drivers' DRM objects, Ashmem.
|
||||
|
||||
In addition to policies listed above, shmem_enabled allows two further
|
||||
values:
|
||||
|
||||
- "deny":
|
||||
For use in emergencies, to force the huge option off from
|
||||
all mounts;
|
||||
- "force":
|
||||
Force the huge option on for all - very useful for testing;
|
||||
|
||||
== Need of application restart ==
|
||||
|
||||
The transparent_hugepage/enabled values only affect future
|
||||
behavior. So to make them effective you need to restart any
|
||||
application that could have been using hugepages. This also applies to
|
||||
the regions registered in khugepaged.
|
||||
The transparent_hugepage/enabled values and tmpfs mount option only affect
|
||||
future behavior. So to make them effective you need to restart any
|
||||
application that could have been using hugepages. This also applies to the
|
||||
regions registered in khugepaged.
|
||||
|
||||
== Monitoring usage ==
|
||||
|
||||
The number of transparent huge pages currently used by the system is
|
||||
available by reading the AnonHugePages field in /proc/meminfo. To
|
||||
identify what applications are using transparent huge pages, it is
|
||||
necessary to read /proc/PID/smaps and count the AnonHugePages fields
|
||||
for each mapping. Note that reading the smaps file is expensive and
|
||||
reading it frequently will incur overhead.
|
||||
The number of anonymous transparent huge pages currently used by the
|
||||
system is available by reading the AnonHugePages field in /proc/meminfo.
|
||||
To identify what applications are using anonymous transparent huge pages,
|
||||
it is necessary to read /proc/PID/smaps and count the AnonHugePages fields
|
||||
for each mapping.
|
||||
|
||||
The number of file transparent huge pages mapped to userspace is available
|
||||
by reading ShmemPmdMapped and ShmemHugePages fields in /proc/meminfo.
|
||||
To identify what applications are mapping file transparent huge pages, it
|
||||
is necessary to read /proc/PID/smaps and count the FileHugeMapped fields
|
||||
for each mapping.
|
||||
|
||||
Note that reading the smaps file is expensive and reading it
|
||||
frequently will incur overhead.
|
||||
|
||||
There are a number of counters in /proc/vmstat that may be used to
|
||||
monitor how successfully the system is providing huge pages for use.
|
||||
|
@ -238,6 +280,12 @@ thp_collapse_alloc_failed is incremented if khugepaged found a range
|
|||
of pages that should be collapsed into one huge page but failed
|
||||
the allocation.
|
||||
|
||||
thp_file_alloc is incremented every time a file huge page is successfully
|
||||
i allocated.
|
||||
|
||||
thp_file_mapped is incremented every time a file huge page is mapped into
|
||||
user address space.
|
||||
|
||||
thp_split_page is incremented every time a huge page is split into base
|
||||
pages. This can happen for a variety of reasons but a common
|
||||
reason is that a huge page is old and is being reclaimed.
|
||||
|
@ -403,19 +451,27 @@ pages:
|
|||
on relevant sub-page of the compound page.
|
||||
|
||||
- map/unmap of the whole compound page accounted in compound_mapcount
|
||||
(stored in first tail page).
|
||||
(stored in first tail page). For file huge pages, we also increment
|
||||
->_mapcount of all sub-pages in order to have race-free detection of
|
||||
last unmap of subpages.
|
||||
|
||||
PageDoubleMap() indicates that ->_mapcount in all subpages is offset up by one.
|
||||
This additional reference is required to get race-free detection of unmap of
|
||||
subpages when we have them mapped with both PMDs and PTEs.
|
||||
PageDoubleMap() indicates that the page is *possibly* mapped with PTEs.
|
||||
|
||||
For anonymous pages PageDoubleMap() also indicates ->_mapcount in all
|
||||
subpages is offset up by one. This additional reference is required to
|
||||
get race-free detection of unmap of subpages when we have them mapped with
|
||||
both PMDs and PTEs.
|
||||
|
||||
This is optimization required to lower overhead of per-subpage mapcount
|
||||
tracking. The alternative is alter ->_mapcount in all subpages on each
|
||||
map/unmap of the whole compound page.
|
||||
|
||||
We set PG_double_map when a PMD of the page got split for the first time,
|
||||
but still have PMD mapping. The additional references go away with last
|
||||
compound_mapcount.
|
||||
For anonymous pages, we set PG_double_map when a PMD of the page got split
|
||||
for the first time, but still have PMD mapping. The additional references
|
||||
go away with last compound_mapcount.
|
||||
|
||||
File pages get PG_double_map set on first map of the page with PTE and
|
||||
goes away when the page gets evicted from page cache.
|
||||
|
||||
split_huge_page internally has to distribute the refcounts in the head
|
||||
page to the tail pages before clearing all PG_head/tail bits from the page
|
||||
|
@ -427,7 +483,7 @@ sum of mapcount of all sub-pages plus one (split_huge_page caller must
|
|||
have reference for head page).
|
||||
|
||||
split_huge_page uses migration entries to stabilize page->_refcount and
|
||||
page->_mapcount.
|
||||
page->_mapcount of anonymous pages. File pages just got unmapped.
|
||||
|
||||
We safe against physical memory scanners too: the only legitimate way
|
||||
scanner can get reference to a page is get_page_unless_zero().
|
||||
|
|
|
@ -461,6 +461,27 @@ unevictable LRU is enabled, the work of compaction is mostly handled by
|
|||
the page migration code and the same work flow as described in MIGRATING
|
||||
MLOCKED PAGES will apply.
|
||||
|
||||
MLOCKING TRANSPARENT HUGE PAGES
|
||||
-------------------------------
|
||||
|
||||
A transparent huge page is represented by a single entry on an LRU list.
|
||||
Therefore, we can only make unevictable an entire compound page, not
|
||||
individual subpages.
|
||||
|
||||
If a user tries to mlock() part of a huge page, we want the rest of the
|
||||
page to be reclaimable.
|
||||
|
||||
We cannot just split the page on partial mlock() as split_huge_page() can
|
||||
fail and new intermittent failure mode for the syscall is undesirable.
|
||||
|
||||
We handle this by keeping PTE-mapped huge pages on normal LRU lists: the
|
||||
PMD on border of VM_LOCKED VMA will be split into PTE table.
|
||||
|
||||
This way the huge page is accessible for vmscan. Under memory pressure the
|
||||
page will be split, subpages which belong to VM_LOCKED VMAs will be moved
|
||||
to unevictable LRU and the rest can be reclaimed.
|
||||
|
||||
See also comment in follow_trans_huge_pmd().
|
||||
|
||||
mmap(MAP_LOCKED) SYSTEM CALL HANDLING
|
||||
-------------------------------------
|
||||
|
|
69
Makefile
69
Makefile
|
@ -647,41 +647,28 @@ ifneq ($(CONFIG_FRAME_WARN),0)
|
|||
KBUILD_CFLAGS += $(call cc-option,-Wframe-larger-than=${CONFIG_FRAME_WARN})
|
||||
endif
|
||||
|
||||
# Handle stack protector mode.
|
||||
#
|
||||
# Since kbuild can potentially perform two passes (first with the old
|
||||
# .config values and then with updated .config values), we cannot error out
|
||||
# if a desired compiler option is unsupported. If we were to error, kbuild
|
||||
# could never get to the second pass and actually notice that we changed
|
||||
# the option to something that was supported.
|
||||
#
|
||||
# Additionally, we don't want to fallback and/or silently change which compiler
|
||||
# flags will be used, since that leads to producing kernels with different
|
||||
# security feature characteristics depending on the compiler used. ("But I
|
||||
# selected CC_STACKPROTECTOR_STRONG! Why did it build with _REGULAR?!")
|
||||
#
|
||||
# The middle ground is to warn here so that the failed option is obvious, but
|
||||
# to let the build fail with bad compiler flags so that we can't produce a
|
||||
# kernel when there is a CONFIG and compiler mismatch.
|
||||
#
|
||||
# This selects the stack protector compiler flag. Testing it is delayed
|
||||
# until after .config has been reprocessed, in the prepare-compiler-check
|
||||
# target.
|
||||
ifdef CONFIG_CC_STACKPROTECTOR_REGULAR
|
||||
stackp-flag := -fstack-protector
|
||||
ifeq ($(call cc-option, $(stackp-flag)),)
|
||||
$(warning Cannot use CONFIG_CC_STACKPROTECTOR_REGULAR: \
|
||||
-fstack-protector not supported by compiler)
|
||||
endif
|
||||
stackp-name := REGULAR
|
||||
else
|
||||
ifdef CONFIG_CC_STACKPROTECTOR_STRONG
|
||||
stackp-flag := -fstack-protector-strong
|
||||
ifeq ($(call cc-option, $(stackp-flag)),)
|
||||
$(warning Cannot use CONFIG_CC_STACKPROTECTOR_STRONG: \
|
||||
-fstack-protector-strong not supported by compiler)
|
||||
endif
|
||||
stackp-name := STRONG
|
||||
else
|
||||
# Force off for distro compilers that enable stack protector by default.
|
||||
stackp-flag := $(call cc-option, -fno-stack-protector)
|
||||
endif
|
||||
endif
|
||||
# Find arch-specific stack protector compiler sanity-checking script.
|
||||
ifdef CONFIG_CC_STACKPROTECTOR
|
||||
stackp-path := $(srctree)/scripts/gcc-$(ARCH)_$(BITS)-has-stack-protector.sh
|
||||
ifneq ($(wildcard $(stackp-path)),)
|
||||
stackp-check := $(stackp-path)
|
||||
endif
|
||||
endif
|
||||
KBUILD_CFLAGS += $(stackp-flag)
|
||||
|
||||
ifdef CONFIG_KCOV
|
||||
|
@ -1017,8 +1004,10 @@ ifneq ($(KBUILD_SRC),)
|
|||
fi;
|
||||
endif
|
||||
|
||||
# prepare2 creates a makefile if using a separate output directory
|
||||
prepare2: prepare3 outputmakefile asm-generic
|
||||
# prepare2 creates a makefile if using a separate output directory.
|
||||
# From this point forward, .config has been reprocessed, so any rules
|
||||
# that need to depend on updated CONFIG_* values can be checked here.
|
||||
prepare2: prepare3 prepare-compiler-check outputmakefile asm-generic
|
||||
|
||||
prepare1: prepare2 $(version_h) include/generated/utsrelease.h \
|
||||
include/config/auto.conf
|
||||
|
@ -1049,6 +1038,32 @@ endif
|
|||
PHONY += prepare-objtool
|
||||
prepare-objtool: $(objtool_target)
|
||||
|
||||
# Check for CONFIG flags that require compiler support. Abort the build
|
||||
# after .config has been processed, but before the kernel build starts.
|
||||
#
|
||||
# For security-sensitive CONFIG options, we don't want to fallback and/or
|
||||
# silently change which compiler flags will be used, since that leads to
|
||||
# producing kernels with different security feature characteristics
|
||||
# depending on the compiler used. (For example, "But I selected
|
||||
# CC_STACKPROTECTOR_STRONG! Why did it build with _REGULAR?!")
|
||||
PHONY += prepare-compiler-check
|
||||
prepare-compiler-check: FORCE
|
||||
# Make sure compiler supports requested stack protector flag.
|
||||
ifdef stackp-name
|
||||
ifeq ($(call cc-option, $(stackp-flag)),)
|
||||
@echo Cannot use CONFIG_CC_STACKPROTECTOR_$(stackp-name): \
|
||||
$(stackp-flag) not supported by compiler >&2 && exit 1
|
||||
endif
|
||||
endif
|
||||
# Make sure compiler does not have buggy stack-protector support.
|
||||
ifdef stackp-check
|
||||
ifneq ($(shell $(CONFIG_SHELL) $(stackp-check) $(CC) $(KBUILD_CPPFLAGS) $(biarch)),y)
|
||||
@echo Cannot use CONFIG_CC_STACKPROTECTOR_$(stackp-name): \
|
||||
$(stackp-flag) available but compiler is broken >&2 && exit 1
|
||||
endif
|
||||
endif
|
||||
@:
|
||||
|
||||
# Generate some files
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
|
|
@ -147,7 +147,7 @@ retry:
|
|||
/* If for any reason at all we couldn't handle the fault,
|
||||
make sure we exit gracefully rather than endlessly redo
|
||||
the fault. */
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
return;
|
||||
|
|
|
@ -137,7 +137,7 @@ good_area:
|
|||
* make sure we exit gracefully rather than endlessly redo
|
||||
* the fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
/* If Pagefault was interrupted by SIGKILL, exit page fault "early" */
|
||||
if (unlikely(fatal_signal_pending(current))) {
|
||||
|
|
|
@ -57,7 +57,7 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
|
|||
extern pgd_t *pgd_alloc(struct mm_struct *mm);
|
||||
extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
|
||||
|
||||
#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO)
|
||||
#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO)
|
||||
|
||||
static inline void clean_pte_table(pte_t *pte)
|
||||
{
|
||||
|
|
|
@ -209,17 +209,38 @@ tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
|
|||
tlb_flush(tlb);
|
||||
}
|
||||
|
||||
static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
||||
static inline bool __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
||||
{
|
||||
if (tlb->nr == tlb->max)
|
||||
return true;
|
||||
tlb->pages[tlb->nr++] = page;
|
||||
VM_BUG_ON(tlb->nr > tlb->max);
|
||||
return tlb->max - tlb->nr;
|
||||
return false;
|
||||
}
|
||||
|
||||
static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
||||
{
|
||||
if (!__tlb_remove_page(tlb, page))
|
||||
if (__tlb_remove_page(tlb, page)) {
|
||||
tlb_flush_mmu(tlb);
|
||||
__tlb_remove_page(tlb, page);
|
||||
}
|
||||
}
|
||||
|
||||
static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
|
||||
struct page *page, int page_size)
|
||||
{
|
||||
return __tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
static inline bool __tlb_remove_pte_page(struct mmu_gather *tlb,
|
||||
struct page *page)
|
||||
{
|
||||
return __tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
static inline void tlb_remove_page_size(struct mmu_gather *tlb,
|
||||
struct page *page, int page_size)
|
||||
{
|
||||
return tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
|
||||
|
|
|
@ -243,7 +243,7 @@ good_area:
|
|||
goto out;
|
||||
}
|
||||
|
||||
return handle_mm_fault(mm, vma, addr & PAGE_MASK, flags);
|
||||
return handle_mm_fault(vma, addr & PAGE_MASK, flags);
|
||||
|
||||
check_stack:
|
||||
/* Don't allow expansion below FIRST_USER_ADDRESS */
|
||||
|
|
|
@ -23,7 +23,7 @@
|
|||
#define __pgd_alloc() kmalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL)
|
||||
#define __pgd_free(pgd) kfree(pgd)
|
||||
#else
|
||||
#define __pgd_alloc() (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_REPEAT, 2)
|
||||
#define __pgd_alloc() (pgd_t *)__get_free_pages(GFP_KERNEL, 2)
|
||||
#define __pgd_free(pgd) free_pages((unsigned long)pgd, 2)
|
||||
#endif
|
||||
|
||||
|
|
|
@ -233,7 +233,7 @@ good_area:
|
|||
goto out;
|
||||
}
|
||||
|
||||
return handle_mm_fault(mm, vma, addr & PAGE_MASK, mm_flags);
|
||||
return handle_mm_fault(vma, addr & PAGE_MASK, mm_flags);
|
||||
|
||||
check_stack:
|
||||
if (vma->vm_flags & VM_GROWSDOWN && !expand_stack(vma, addr))
|
||||
|
|
|
@ -134,7 +134,7 @@ good_area:
|
|||
* sure we exit gracefully rather than endlessly redo the
|
||||
* fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
return;
|
||||
|
|
|
@ -168,7 +168,7 @@ retry:
|
|||
* the fault.
|
||||
*/
|
||||
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
return;
|
||||
|
|
|
@ -164,7 +164,7 @@ asmlinkage void do_page_fault(int datammu, unsigned long esr0, unsigned long ear
|
|||
* make sure we exit gracefully rather than endlessly redo
|
||||
* the fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, ear0, flags);
|
||||
fault = handle_mm_fault(vma, ear0, flags);
|
||||
if (unlikely(fault & VM_FAULT_ERROR)) {
|
||||
if (fault & VM_FAULT_OOM)
|
||||
goto out_of_memory;
|
||||
|
|
|
@ -101,7 +101,7 @@ good_area:
|
|||
break;
|
||||
}
|
||||
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
return;
|
||||
|
|
|
@ -205,17 +205,18 @@ tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
|
|||
* must be delayed until after the TLB has been flushed (see comments at the beginning of
|
||||
* this file).
|
||||
*/
|
||||
static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
||||
static inline bool __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
||||
{
|
||||
if (tlb->nr == tlb->max)
|
||||
return true;
|
||||
|
||||
tlb->need_flush = 1;
|
||||
|
||||
if (!tlb->nr && tlb->pages == tlb->local)
|
||||
__tlb_alloc_page(tlb);
|
||||
|
||||
tlb->pages[tlb->nr++] = page;
|
||||
VM_BUG_ON(tlb->nr > tlb->max);
|
||||
|
||||
return tlb->max - tlb->nr;
|
||||
return false;
|
||||
}
|
||||
|
||||
static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
|
||||
|
@ -235,8 +236,28 @@ static inline void tlb_flush_mmu(struct mmu_gather *tlb)
|
|||
|
||||
static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
||||
{
|
||||
if (!__tlb_remove_page(tlb, page))
|
||||
if (__tlb_remove_page(tlb, page)) {
|
||||
tlb_flush_mmu(tlb);
|
||||
__tlb_remove_page(tlb, page);
|
||||
}
|
||||
}
|
||||
|
||||
static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
|
||||
struct page *page, int page_size)
|
||||
{
|
||||
return __tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
static inline bool __tlb_remove_pte_page(struct mmu_gather *tlb,
|
||||
struct page *page)
|
||||
{
|
||||
return __tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
static inline void tlb_remove_page_size(struct mmu_gather *tlb,
|
||||
struct page *page, int page_size)
|
||||
{
|
||||
return tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
/*
|
||||
|
|
|
@ -159,7 +159,7 @@ retry:
|
|||
* sure we exit gracefully rather than endlessly redo the
|
||||
* fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
return;
|
||||
|
|
|
@ -41,6 +41,9 @@ EXPORT_SYMBOL(cpu_data);
|
|||
EXPORT_SYMBOL(smp_flush_tlb_page);
|
||||
#endif
|
||||
|
||||
extern int __ucmpdi2(unsigned long long a, unsigned long long b);
|
||||
EXPORT_SYMBOL(__ucmpdi2);
|
||||
|
||||
/* compiler generated symbol */
|
||||
extern void __ashldi3(void);
|
||||
extern void __ashrdi3(void);
|
||||
|
|
|
@ -3,5 +3,5 @@
|
|||
#
|
||||
|
||||
lib-y := checksum.o ashxdi3.o memset.o memcpy.o \
|
||||
delay.o strlen.o usercopy.o csum_partial_copy.o
|
||||
|
||||
delay.o strlen.o usercopy.o csum_partial_copy.o \
|
||||
ucmpdi2.o
|
||||
|
|
|
@ -0,0 +1,23 @@
|
|||
#ifndef __ASM_LIBGCC_H
|
||||
#define __ASM_LIBGCC_H
|
||||
|
||||
#include <asm/byteorder.h>
|
||||
|
||||
#ifdef __BIG_ENDIAN
|
||||
struct DWstruct {
|
||||
int high, low;
|
||||
};
|
||||
#elif defined(__LITTLE_ENDIAN)
|
||||
struct DWstruct {
|
||||
int low, high;
|
||||
};
|
||||
#else
|
||||
#error I feel sick.
|
||||
#endif
|
||||
|
||||
typedef union {
|
||||
struct DWstruct s;
|
||||
long long ll;
|
||||
} DWunion;
|
||||
|
||||
#endif /* __ASM_LIBGCC_H */
|
|
@ -0,0 +1,17 @@
|
|||
#include "libgcc.h"
|
||||
|
||||
int __ucmpdi2(unsigned long long a, unsigned long long b)
|
||||
{
|
||||
const DWunion au = {.ll = a};
|
||||
const DWunion bu = {.ll = b};
|
||||
|
||||
if ((unsigned int)au.s.high < (unsigned int)bu.s.high)
|
||||
return 0;
|
||||
else if ((unsigned int)au.s.high > (unsigned int)bu.s.high)
|
||||
return 2;
|
||||
if ((unsigned int)au.s.low < (unsigned int)bu.s.low)
|
||||
return 0;
|
||||
else if ((unsigned int)au.s.low > (unsigned int)bu.s.low)
|
||||
return 2;
|
||||
return 1;
|
||||
}
|
|
@ -196,7 +196,7 @@ good_area:
|
|||
*/
|
||||
addr = (address & PAGE_MASK);
|
||||
set_thread_fault_code(error_code);
|
||||
fault = handle_mm_fault(mm, vma, addr, flags);
|
||||
fault = handle_mm_fault(vma, addr, flags);
|
||||
if (unlikely(fault & VM_FAULT_ERROR)) {
|
||||
if (fault & VM_FAULT_OOM)
|
||||
goto out_of_memory;
|
||||
|
|
|
@ -136,7 +136,7 @@ good_area:
|
|||
* the fault.
|
||||
*/
|
||||
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
pr_debug("handle_mm_fault returns %d\n", fault);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
|
|
|
@ -133,7 +133,7 @@ good_area:
|
|||
* make sure we exit gracefully rather than endlessly redo
|
||||
* the fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
return 0;
|
||||
|
|
|
@ -216,7 +216,7 @@ good_area:
|
|||
* make sure we exit gracefully rather than endlessly redo
|
||||
* the fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
return;
|
||||
|
|
|
@ -153,7 +153,7 @@ good_area:
|
|||
* make sure we exit gracefully rather than endlessly redo
|
||||
* the fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
return;
|
||||
|
|
|
@ -254,7 +254,7 @@ good_area:
|
|||
* make sure we exit gracefully rather than endlessly redo
|
||||
* the fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
return;
|
||||
|
|
|
@ -131,7 +131,7 @@ good_area:
|
|||
* make sure we exit gracefully rather than endlessly redo
|
||||
* the fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
return;
|
||||
|
|
|
@ -163,7 +163,7 @@ good_area:
|
|||
* the fault.
|
||||
*/
|
||||
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
return;
|
||||
|
|
|
@ -239,7 +239,7 @@ good_area:
|
|||
* fault.
|
||||
*/
|
||||
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
return;
|
||||
|
|
|
@ -71,10 +71,8 @@ pte_t *__find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
|
|||
static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
|
||||
bool *is_thp, unsigned *shift)
|
||||
{
|
||||
if (!arch_irqs_disabled()) {
|
||||
pr_info("%s called with irq enabled\n", __func__);
|
||||
dump_stack();
|
||||
}
|
||||
VM_WARN(!arch_irqs_disabled(),
|
||||
"%s called with irq enabled\n", __func__);
|
||||
return __find_linux_pte_or_hugepte(pgdir, ea, is_thp, shift);
|
||||
}
|
||||
|
||||
|
|
|
@ -75,7 +75,7 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
|
|||
}
|
||||
|
||||
ret = 0;
|
||||
*flt = handle_mm_fault(mm, vma, ea, is_write ? FAULT_FLAG_WRITE : 0);
|
||||
*flt = handle_mm_fault(vma, ea, is_write ? FAULT_FLAG_WRITE : 0);
|
||||
if (unlikely(*flt & VM_FAULT_ERROR)) {
|
||||
if (*flt & VM_FAULT_OOM) {
|
||||
ret = -ENOMEM;
|
||||
|
|
|
@ -429,7 +429,7 @@ good_area:
|
|||
* make sure we exit gracefully rather than endlessly redo
|
||||
* the fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
if (unlikely(fault & (VM_FAULT_RETRY|VM_FAULT_ERROR))) {
|
||||
if (fault & VM_FAULT_SIGSEGV)
|
||||
goto bad_area;
|
||||
|
|
|
@ -87,10 +87,10 @@ static inline void tlb_finish_mmu(struct mmu_gather *tlb,
|
|||
* tlb_ptep_clear_flush. In both flush modes the tlb for a page cache page
|
||||
* has already been freed, so just do free_page_and_swap_cache.
|
||||
*/
|
||||
static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
||||
static inline bool __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
||||
{
|
||||
free_page_and_swap_cache(page);
|
||||
return 1; /* avoid calling tlb_flush_mmu */
|
||||
return false; /* avoid calling tlb_flush_mmu */
|
||||
}
|
||||
|
||||
static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
||||
|
@ -98,6 +98,24 @@ static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
|||
free_page_and_swap_cache(page);
|
||||
}
|
||||
|
||||
static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
|
||||
struct page *page, int page_size)
|
||||
{
|
||||
return __tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
static inline bool __tlb_remove_pte_page(struct mmu_gather *tlb,
|
||||
struct page *page)
|
||||
{
|
||||
return __tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
static inline void tlb_remove_page_size(struct mmu_gather *tlb,
|
||||
struct page *page, int page_size)
|
||||
{
|
||||
return tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
/*
|
||||
* pte_free_tlb frees a pte table and clears the CRSTE for the
|
||||
* page table from the tlb.
|
||||
|
|
|
@ -456,7 +456,7 @@ retry:
|
|||
* make sure we exit gracefully rather than endlessly redo
|
||||
* the fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
/* No reason to continue if interrupted by SIGKILL. */
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
|
||||
fault = VM_FAULT_SIGNAL;
|
||||
|
|
|
@ -111,7 +111,7 @@ good_area:
|
|||
* make sure we exit gracefully rather than endlessly redo
|
||||
* the fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
if (unlikely(fault & VM_FAULT_ERROR)) {
|
||||
if (fault & VM_FAULT_OOM)
|
||||
goto out_of_memory;
|
||||
|
|
|
@ -101,7 +101,7 @@ static inline void tlb_flush_mmu(struct mmu_gather *tlb)
|
|||
static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
||||
{
|
||||
free_page_and_swap_cache(page);
|
||||
return 1; /* avoid calling tlb_flush_mmu */
|
||||
return false; /* avoid calling tlb_flush_mmu */
|
||||
}
|
||||
|
||||
static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
||||
|
@ -109,6 +109,24 @@ static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
|||
__tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
|
||||
struct page *page, int page_size)
|
||||
{
|
||||
return __tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
static inline bool __tlb_remove_pte_page(struct mmu_gather *tlb,
|
||||
struct page *page)
|
||||
{
|
||||
return __tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
static inline void tlb_remove_page_size(struct mmu_gather *tlb,
|
||||
struct page *page, int page_size)
|
||||
{
|
||||
return tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
#define pte_free_tlb(tlb, ptep, addr) pte_free((tlb)->mm, ptep)
|
||||
#define pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
|
||||
#define pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp)
|
||||
|
|
|
@ -487,7 +487,7 @@ good_area:
|
|||
* make sure we exit gracefully rather than endlessly redo
|
||||
* the fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if (unlikely(fault & (VM_FAULT_RETRY | VM_FAULT_ERROR)))
|
||||
if (mm_fault_error(regs, error_code, address, fault))
|
||||
|
|
|
@ -241,7 +241,7 @@ good_area:
|
|||
* make sure we exit gracefully rather than endlessly redo
|
||||
* the fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
return;
|
||||
|
@ -411,7 +411,7 @@ good_area:
|
|||
if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
|
||||
goto bad_area;
|
||||
}
|
||||
switch (handle_mm_fault(mm, vma, address, flags)) {
|
||||
switch (handle_mm_fault(vma, address, flags)) {
|
||||
case VM_FAULT_SIGBUS:
|
||||
case VM_FAULT_OOM:
|
||||
goto do_sigbus;
|
||||
|
|
|
@ -436,7 +436,7 @@ good_area:
|
|||
goto bad_area;
|
||||
}
|
||||
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
goto exit_exception;
|
||||
|
|
|
@ -434,7 +434,7 @@ good_area:
|
|||
* make sure we exit gracefully rather than endlessly redo
|
||||
* the fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
return 0;
|
||||
|
|
|
@ -102,7 +102,7 @@ static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
|||
{
|
||||
tlb->need_flush = 1;
|
||||
free_page_and_swap_cache(page);
|
||||
return 1; /* avoid calling tlb_flush_mmu */
|
||||
return false; /* avoid calling tlb_flush_mmu */
|
||||
}
|
||||
|
||||
static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
||||
|
@ -110,6 +110,24 @@ static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
|||
__tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
|
||||
struct page *page, int page_size)
|
||||
{
|
||||
return __tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
static inline bool __tlb_remove_pte_page(struct mmu_gather *tlb,
|
||||
struct page *page)
|
||||
{
|
||||
return __tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
static inline void tlb_remove_page_size(struct mmu_gather *tlb,
|
||||
struct page *page, int page_size)
|
||||
{
|
||||
return tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
/**
|
||||
* tlb_remove_tlb_entry - remember a pte unmapping for later tlb invalidation.
|
||||
*
|
||||
|
|
|
@ -73,7 +73,7 @@ good_area:
|
|||
do {
|
||||
int fault;
|
||||
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
goto out_nosemaphore;
|
||||
|
|
|
@ -194,7 +194,7 @@ good_area:
|
|||
* If for any reason at all we couldn't handle the fault, make
|
||||
* sure we exit gracefully rather than endlessly redo the fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, addr & PAGE_MASK, flags);
|
||||
fault = handle_mm_fault(vma, addr & PAGE_MASK, flags);
|
||||
return fault;
|
||||
|
||||
check_stack:
|
||||
|
|
|
@ -126,14 +126,6 @@ else
|
|||
KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args)
|
||||
endif
|
||||
|
||||
# Make sure compiler does not have buggy stack-protector support.
|
||||
ifdef CONFIG_CC_STACKPROTECTOR
|
||||
cc_has_sp := $(srctree)/scripts/gcc-x86_$(BITS)-has-stack-protector.sh
|
||||
ifneq ($(shell $(CONFIG_SHELL) $(cc_has_sp) $(CC) $(KBUILD_CPPFLAGS) $(biarch)),y)
|
||||
$(warning stack-protector enabled but compiler support broken)
|
||||
endif
|
||||
endif
|
||||
|
||||
ifdef CONFIG_X86_X32
|
||||
x32_ld_ok := $(call try-run,\
|
||||
/bin/echo -e '1: .quad 1b' | \
|
||||
|
|
|
@ -81,7 +81,11 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
|
|||
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
|
||||
{
|
||||
struct page *page;
|
||||
page = alloc_pages(GFP_KERNEL | __GFP_ZERO, 0);
|
||||
gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO;
|
||||
|
||||
if (mm == &init_mm)
|
||||
gfp &= ~__GFP_ACCOUNT;
|
||||
page = alloc_pages(gfp, 0);
|
||||
if (!page)
|
||||
return NULL;
|
||||
if (!pgtable_pmd_page_ctor(page)) {
|
||||
|
@ -125,7 +129,11 @@ static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
|
|||
|
||||
static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
|
||||
{
|
||||
return (pud_t *)get_zeroed_page(GFP_KERNEL);
|
||||
gfp_t gfp = GFP_KERNEL_ACCOUNT;
|
||||
|
||||
if (mm == &init_mm)
|
||||
gfp &= ~__GFP_ACCOUNT;
|
||||
return (pud_t *)get_zeroed_page(gfp);
|
||||
}
|
||||
|
||||
static inline void pud_free(struct mm_struct *mm, pud_t *pud)
|
||||
|
|
|
@ -1353,7 +1353,7 @@ good_area:
|
|||
* the fault. Since we never set FAULT_FLAG_RETRY_NOWAIT, if
|
||||
* we get VM_FAULT_RETRY back, the mmap_sem has been unlocked.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
major |= fault & VM_FAULT_MAJOR;
|
||||
|
||||
/*
|
||||
|
|
|
@ -6,7 +6,7 @@
|
|||
#include <asm/fixmap.h>
|
||||
#include <asm/mtrr.h>
|
||||
|
||||
#define PGALLOC_GFP GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO
|
||||
#define PGALLOC_GFP (GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | __GFP_ZERO)
|
||||
|
||||
#ifdef CONFIG_HIGHPTE
|
||||
#define PGALLOC_USER_GFP __GFP_HIGHMEM
|
||||
|
@ -18,7 +18,7 @@ gfp_t __userpte_alloc_gfp = PGALLOC_GFP | PGALLOC_USER_GFP;
|
|||
|
||||
pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
|
||||
{
|
||||
return (pte_t *)__get_free_page(PGALLOC_GFP);
|
||||
return (pte_t *)__get_free_page(PGALLOC_GFP & ~__GFP_ACCOUNT);
|
||||
}
|
||||
|
||||
pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
|
||||
|
@ -207,9 +207,13 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
|
|||
{
|
||||
int i;
|
||||
bool failed = false;
|
||||
gfp_t gfp = PGALLOC_GFP;
|
||||
|
||||
if (mm == &init_mm)
|
||||
gfp &= ~__GFP_ACCOUNT;
|
||||
|
||||
for(i = 0; i < PREALLOCATED_PMDS; i++) {
|
||||
pmd_t *pmd = (pmd_t *)__get_free_page(PGALLOC_GFP);
|
||||
pmd_t *pmd = (pmd_t *)__get_free_page(gfp);
|
||||
if (!pmd)
|
||||
failed = true;
|
||||
if (pmd && !pgtable_pmd_page_ctor(virt_to_page(pmd))) {
|
||||
|
|
|
@ -110,7 +110,7 @@ good_area:
|
|||
* make sure we exit gracefully rather than endlessly redo
|
||||
* the fault.
|
||||
*/
|
||||
fault = handle_mm_fault(mm, vma, address, flags);
|
||||
fault = handle_mm_fault(vma, address, flags);
|
||||
|
||||
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
|
||||
return;
|
||||
|
|
|
@ -391,6 +391,7 @@ static ssize_t show_valid_zones(struct device *dev,
|
|||
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
|
||||
struct page *first_page;
|
||||
struct zone *zone;
|
||||
int zone_shift = 0;
|
||||
|
||||
start_pfn = section_nr_to_pfn(mem->start_section_nr);
|
||||
end_pfn = start_pfn + nr_pages;
|
||||
|
@ -402,21 +403,26 @@ static ssize_t show_valid_zones(struct device *dev,
|
|||
|
||||
zone = page_zone(first_page);
|
||||
|
||||
if (zone_idx(zone) == ZONE_MOVABLE - 1) {
|
||||
/*The mem block is the last memoryblock of this zone.*/
|
||||
if (end_pfn == zone_end_pfn(zone))
|
||||
return sprintf(buf, "%s %s\n",
|
||||
zone->name, (zone + 1)->name);
|
||||
/* MMOP_ONLINE_KEEP */
|
||||
sprintf(buf, "%s", zone->name);
|
||||
|
||||
/* MMOP_ONLINE_KERNEL */
|
||||
zone_shift = zone_can_shift(start_pfn, nr_pages, ZONE_NORMAL);
|
||||
if (zone_shift) {
|
||||
strcat(buf, " ");
|
||||
strcat(buf, (zone + zone_shift)->name);
|
||||
}
|
||||
|
||||
if (zone_idx(zone) == ZONE_MOVABLE) {
|
||||
/*The mem block is the first memoryblock of ZONE_MOVABLE.*/
|
||||
if (start_pfn == zone->zone_start_pfn)
|
||||
return sprintf(buf, "%s %s\n",
|
||||
zone->name, (zone - 1)->name);
|
||||
/* MMOP_ONLINE_MOVABLE */
|
||||
zone_shift = zone_can_shift(start_pfn, nr_pages, ZONE_MOVABLE);
|
||||
if (zone_shift) {
|
||||
strcat(buf, " ");
|
||||
strcat(buf, (zone + zone_shift)->name);
|
||||
}
|
||||
|
||||
return sprintf(buf, "%s\n", zone->name);
|
||||
strcat(buf, "\n");
|
||||
|
||||
return strlen(buf);
|
||||
}
|
||||
static DEVICE_ATTR(valid_zones, 0444, show_valid_zones, NULL);
|
||||
#endif
|
||||
|
|
|
@ -113,6 +113,8 @@ static ssize_t node_read_meminfo(struct device *dev,
|
|||
"Node %d SUnreclaim: %8lu kB\n"
|
||||
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
||||
"Node %d AnonHugePages: %8lu kB\n"
|
||||
"Node %d ShmemHugePages: %8lu kB\n"
|
||||
"Node %d ShmemPmdMapped: %8lu kB\n"
|
||||
#endif
|
||||
,
|
||||
nid, K(node_page_state(nid, NR_FILE_DIRTY)),
|
||||
|
@ -131,10 +133,13 @@ static ssize_t node_read_meminfo(struct device *dev,
|
|||
node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
|
||||
nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE)),
|
||||
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
||||
nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE))
|
||||
, nid,
|
||||
K(node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) *
|
||||
HPAGE_PMD_NR));
|
||||
nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
|
||||
nid, K(node_page_state(nid, NR_ANON_THPS) *
|
||||
HPAGE_PMD_NR),
|
||||
nid, K(node_page_state(nid, NR_SHMEM_THPS) *
|
||||
HPAGE_PMD_NR),
|
||||
nid, K(node_page_state(nid, NR_SHMEM_PMDMAPPED) *
|
||||
HPAGE_PMD_NR));
|
||||
#else
|
||||
nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE)));
|
||||
#endif
|
||||
|
|
|
@ -1,8 +1,7 @@
|
|||
config ZRAM
|
||||
tristate "Compressed RAM block device support"
|
||||
depends on BLOCK && SYSFS && ZSMALLOC
|
||||
select LZO_COMPRESS
|
||||
select LZO_DECOMPRESS
|
||||
depends on BLOCK && SYSFS && ZSMALLOC && CRYPTO
|
||||
select CRYPTO_LZO
|
||||
default n
|
||||
help
|
||||
Creates virtual block devices called /dev/zramX (X = 0, 1, ...).
|
||||
|
@ -14,13 +13,3 @@ config ZRAM
|
|||
disks and maybe many more.
|
||||
|
||||
See zram.txt for more information.
|
||||
|
||||
config ZRAM_LZ4_COMPRESS
|
||||
bool "Enable LZ4 algorithm support"
|
||||
depends on ZRAM
|
||||
select LZ4_COMPRESS
|
||||
select LZ4_DECOMPRESS
|
||||
default n
|
||||
help
|
||||
This option enables LZ4 compression algorithm support. Compression
|
||||
algorithm can be changed using `comp_algorithm' device attribute.
|
|
@ -1,5 +1,3 @@
|
|||
zram-y := zcomp_lzo.o zcomp.o zram_drv.o
|
||||
|
||||
zram-$(CONFIG_ZRAM_LZ4_COMPRESS) += zcomp_lz4.o
|
||||
zram-y := zcomp.o zram_drv.o
|
||||
|
||||
obj-$(CONFIG_ZRAM) += zram.o
|
||||
|
|
|
@ -14,108 +14,150 @@
|
|||
#include <linux/wait.h>
|
||||
#include <linux/sched.h>
|
||||
#include <linux/cpu.h>
|
||||
#include <linux/crypto.h>
|
||||
|
||||
#include "zcomp.h"
|
||||
#include "zcomp_lzo.h"
|
||||
#ifdef CONFIG_ZRAM_LZ4_COMPRESS
|
||||
#include "zcomp_lz4.h"
|
||||
#endif
|
||||
|
||||
static struct zcomp_backend *backends[] = {
|
||||
&zcomp_lzo,
|
||||
#ifdef CONFIG_ZRAM_LZ4_COMPRESS
|
||||
&zcomp_lz4,
|
||||
static const char * const backends[] = {
|
||||
"lzo",
|
||||
#if IS_ENABLED(CONFIG_CRYPTO_LZ4)
|
||||
"lz4",
|
||||
#endif
|
||||
#if IS_ENABLED(CONFIG_CRYPTO_DEFLATE)
|
||||
"deflate",
|
||||
#endif
|
||||
#if IS_ENABLED(CONFIG_CRYPTO_LZ4HC)
|
||||
"lz4hc",
|
||||
#endif
|
||||
#if IS_ENABLED(CONFIG_CRYPTO_842)
|
||||
"842",
|
||||
#endif
|
||||
NULL
|
||||
};
|
||||
|
||||
static struct zcomp_backend *find_backend(const char *compress)
|
||||
static void zcomp_strm_free(struct zcomp_strm *zstrm)
|
||||
{
|
||||
int i = 0;
|
||||
while (backends[i]) {
|
||||
if (sysfs_streq(compress, backends[i]->name))
|
||||
break;
|
||||
i++;
|
||||
}
|
||||
return backends[i];
|
||||
}
|
||||
|
||||
static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm)
|
||||
{
|
||||
if (zstrm->private)
|
||||
comp->backend->destroy(zstrm->private);
|
||||
if (!IS_ERR_OR_NULL(zstrm->tfm))
|
||||
crypto_free_comp(zstrm->tfm);
|
||||
free_pages((unsigned long)zstrm->buffer, 1);
|
||||
kfree(zstrm);
|
||||
}
|
||||
|
||||
/*
|
||||
* allocate new zcomp_strm structure with ->private initialized by
|
||||
* allocate new zcomp_strm structure with ->tfm initialized by
|
||||
* backend, return NULL on error
|
||||
*/
|
||||
static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp, gfp_t flags)
|
||||
static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
|
||||
{
|
||||
struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), flags);
|
||||
struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), GFP_KERNEL);
|
||||
if (!zstrm)
|
||||
return NULL;
|
||||
|
||||
zstrm->private = comp->backend->create(flags);
|
||||
zstrm->tfm = crypto_alloc_comp(comp->name, 0, 0);
|
||||
/*
|
||||
* allocate 2 pages. 1 for compressed data, plus 1 extra for the
|
||||
* case when compressed size is larger than the original one
|
||||
*/
|
||||
zstrm->buffer = (void *)__get_free_pages(flags | __GFP_ZERO, 1);
|
||||
if (!zstrm->private || !zstrm->buffer) {
|
||||
zcomp_strm_free(comp, zstrm);
|
||||
zstrm->buffer = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
|
||||
if (IS_ERR_OR_NULL(zstrm->tfm) || !zstrm->buffer) {
|
||||
zcomp_strm_free(zstrm);
|
||||
zstrm = NULL;
|
||||
}
|
||||
return zstrm;
|
||||
}
|
||||
|
||||
/* show available compressors */
|
||||
ssize_t zcomp_available_show(const char *comp, char *buf)
|
||||
bool zcomp_available_algorithm(const char *comp)
|
||||
{
|
||||
ssize_t sz = 0;
|
||||
int i = 0;
|
||||
|
||||
while (backends[i]) {
|
||||
if (!strcmp(comp, backends[i]->name))
|
||||
sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2,
|
||||
"[%s] ", backends[i]->name);
|
||||
else
|
||||
sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2,
|
||||
"%s ", backends[i]->name);
|
||||
if (sysfs_streq(comp, backends[i]))
|
||||
return true;
|
||||
i++;
|
||||
}
|
||||
|
||||
/*
|
||||
* Crypto does not ignore a trailing new line symbol,
|
||||
* so make sure you don't supply a string containing
|
||||
* one.
|
||||
* This also means that we permit zcomp initialisation
|
||||
* with any compressing algorithm known to crypto api.
|
||||
*/
|
||||
return crypto_has_comp(comp, 0, 0) == 1;
|
||||
}
|
||||
|
||||
/* show available compressors */
|
||||
ssize_t zcomp_available_show(const char *comp, char *buf)
|
||||
{
|
||||
bool known_algorithm = false;
|
||||
ssize_t sz = 0;
|
||||
int i = 0;
|
||||
|
||||
for (; backends[i]; i++) {
|
||||
if (!strcmp(comp, backends[i])) {
|
||||
known_algorithm = true;
|
||||
sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2,
|
||||
"[%s] ", backends[i]);
|
||||
} else {
|
||||
sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2,
|
||||
"%s ", backends[i]);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Out-of-tree module known to crypto api or a missing
|
||||
* entry in `backends'.
|
||||
*/
|
||||
if (!known_algorithm && crypto_has_comp(comp, 0, 0) == 1)
|
||||
sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2,
|
||||
"[%s] ", comp);
|
||||
|
||||
sz += scnprintf(buf + sz, PAGE_SIZE - sz, "\n");
|
||||
return sz;
|
||||
}
|
||||
|
||||
bool zcomp_available_algorithm(const char *comp)
|
||||
{
|
||||
return find_backend(comp) != NULL;
|
||||
}
|
||||
|
||||
struct zcomp_strm *zcomp_strm_find(struct zcomp *comp)
|
||||
struct zcomp_strm *zcomp_stream_get(struct zcomp *comp)
|
||||
{
|
||||
return *get_cpu_ptr(comp->stream);
|
||||
}
|
||||
|
||||
void zcomp_strm_release(struct zcomp *comp, struct zcomp_strm *zstrm)
|
||||
void zcomp_stream_put(struct zcomp *comp)
|
||||
{
|
||||
put_cpu_ptr(comp->stream);
|
||||
}
|
||||
|
||||
int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
|
||||
const unsigned char *src, size_t *dst_len)
|
||||
int zcomp_compress(struct zcomp_strm *zstrm,
|
||||
const void *src, unsigned int *dst_len)
|
||||
{
|
||||
return comp->backend->compress(src, zstrm->buffer, dst_len,
|
||||
zstrm->private);
|
||||
/*
|
||||
* Our dst memory (zstrm->buffer) is always `2 * PAGE_SIZE' sized
|
||||
* because sometimes we can endup having a bigger compressed data
|
||||
* due to various reasons: for example compression algorithms tend
|
||||
* to add some padding to the compressed buffer. Speaking of padding,
|
||||
* comp algorithm `842' pads the compressed length to multiple of 8
|
||||
* and returns -ENOSP when the dst memory is not big enough, which
|
||||
* is not something that ZRAM wants to see. We can handle the
|
||||
* `compressed_size > PAGE_SIZE' case easily in ZRAM, but when we
|
||||
* receive -ERRNO from the compressing backend we can't help it
|
||||
* anymore. To make `842' happy we need to tell the exact size of
|
||||
* the dst buffer, zram_drv will take care of the fact that
|
||||
* compressed buffer is too big.
|
||||
*/
|
||||
*dst_len = PAGE_SIZE * 2;
|
||||
|
||||
return crypto_comp_compress(zstrm->tfm,
|
||||
src, PAGE_SIZE,
|
||||
zstrm->buffer, dst_len);
|
||||
}
|
||||
|
||||
int zcomp_decompress(struct zcomp *comp, const unsigned char *src,
|
||||
size_t src_len, unsigned char *dst)
|
||||
int zcomp_decompress(struct zcomp_strm *zstrm,
|
||||
const void *src, unsigned int src_len, void *dst)
|
||||
{
|
||||
return comp->backend->decompress(src, src_len, dst);
|
||||
unsigned int dst_len = PAGE_SIZE;
|
||||
|
||||
return crypto_comp_decompress(zstrm->tfm,
|
||||
src, src_len,
|
||||
dst, &dst_len);
|
||||
}
|
||||
|
||||
static int __zcomp_cpu_notifier(struct zcomp *comp,
|
||||
|
@ -127,7 +169,7 @@ static int __zcomp_cpu_notifier(struct zcomp *comp,
|
|||
case CPU_UP_PREPARE:
|
||||
if (WARN_ON(*per_cpu_ptr(comp->stream, cpu)))
|
||||
break;
|
||||
zstrm = zcomp_strm_alloc(comp, GFP_KERNEL);
|
||||
zstrm = zcomp_strm_alloc(comp);
|
||||
if (IS_ERR_OR_NULL(zstrm)) {
|
||||
pr_err("Can't allocate a compression stream\n");
|
||||
return NOTIFY_BAD;
|
||||
|
@ -138,7 +180,7 @@ static int __zcomp_cpu_notifier(struct zcomp *comp,
|
|||
case CPU_UP_CANCELED:
|
||||
zstrm = *per_cpu_ptr(comp->stream, cpu);
|
||||
if (!IS_ERR_OR_NULL(zstrm))
|
||||
zcomp_strm_free(comp, zstrm);
|
||||
zcomp_strm_free(zstrm);
|
||||
*per_cpu_ptr(comp->stream, cpu) = NULL;
|
||||
break;
|
||||
default:
|
||||
|
@ -209,18 +251,16 @@ void zcomp_destroy(struct zcomp *comp)
|
|||
struct zcomp *zcomp_create(const char *compress)
|
||||
{
|
||||
struct zcomp *comp;
|
||||
struct zcomp_backend *backend;
|
||||
int error;
|
||||
|
||||
backend = find_backend(compress);
|
||||
if (!backend)
|
||||
if (!zcomp_available_algorithm(compress))
|
||||
return ERR_PTR(-EINVAL);
|
||||
|
||||
comp = kzalloc(sizeof(struct zcomp), GFP_KERNEL);
|
||||
if (!comp)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
|
||||
comp->backend = backend;
|
||||
comp->name = compress;
|
||||
error = zcomp_init(comp);
|
||||
if (error) {
|
||||
kfree(comp);
|
||||
|
|
|
@ -13,33 +13,15 @@
|
|||
struct zcomp_strm {
|
||||
/* compression/decompression buffer */
|
||||
void *buffer;
|
||||
/*
|
||||
* The private data of the compression stream, only compression
|
||||
* stream backend can touch this (e.g. compression algorithm
|
||||
* working memory)
|
||||
*/
|
||||
void *private;
|
||||
};
|
||||
|
||||
/* static compression backend */
|
||||
struct zcomp_backend {
|
||||
int (*compress)(const unsigned char *src, unsigned char *dst,
|
||||
size_t *dst_len, void *private);
|
||||
|
||||
int (*decompress)(const unsigned char *src, size_t src_len,
|
||||
unsigned char *dst);
|
||||
|
||||
void *(*create)(gfp_t flags);
|
||||
void (*destroy)(void *private);
|
||||
|
||||
const char *name;
|
||||
struct crypto_comp *tfm;
|
||||
};
|
||||
|
||||
/* dynamic per-device compression frontend */
|
||||
struct zcomp {
|
||||
struct zcomp_strm * __percpu *stream;
|
||||
struct zcomp_backend *backend;
|
||||
struct notifier_block notifier;
|
||||
|
||||
const char *name;
|
||||
};
|
||||
|
||||
ssize_t zcomp_available_show(const char *comp, char *buf);
|
||||
|
@ -48,14 +30,14 @@ bool zcomp_available_algorithm(const char *comp);
|
|||
struct zcomp *zcomp_create(const char *comp);
|
||||
void zcomp_destroy(struct zcomp *comp);
|
||||
|
||||
struct zcomp_strm *zcomp_strm_find(struct zcomp *comp);
|
||||
void zcomp_strm_release(struct zcomp *comp, struct zcomp_strm *zstrm);
|
||||
struct zcomp_strm *zcomp_stream_get(struct zcomp *comp);
|
||||
void zcomp_stream_put(struct zcomp *comp);
|
||||
|
||||
int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
|
||||
const unsigned char *src, size_t *dst_len);
|
||||
int zcomp_compress(struct zcomp_strm *zstrm,
|
||||
const void *src, unsigned int *dst_len);
|
||||
|
||||
int zcomp_decompress(struct zcomp *comp, const unsigned char *src,
|
||||
size_t src_len, unsigned char *dst);
|
||||
int zcomp_decompress(struct zcomp_strm *zstrm,
|
||||
const void *src, unsigned int src_len, void *dst);
|
||||
|
||||
bool zcomp_set_max_streams(struct zcomp *comp, int num_strm);
|
||||
#endif /* _ZCOMP_H_ */
|
||||
|
|
|
@ -1,56 +0,0 @@
|
|||
/*
|
||||
* Copyright (C) 2014 Sergey Senozhatsky.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or
|
||||
* modify it under the terms of the GNU General Public License
|
||||
* as published by the Free Software Foundation; either version
|
||||
* 2 of the License, or (at your option) any later version.
|
||||
*/
|
||||
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/lz4.h>
|
||||
#include <linux/vmalloc.h>
|
||||
#include <linux/mm.h>
|
||||
|
||||
#include "zcomp_lz4.h"
|
||||
|
||||
static void *zcomp_lz4_create(gfp_t flags)
|
||||
{
|
||||
void *ret;
|
||||
|
||||
ret = kmalloc(LZ4_MEM_COMPRESS, flags);
|
||||
if (!ret)
|
||||
ret = __vmalloc(LZ4_MEM_COMPRESS,
|
||||
flags | __GFP_HIGHMEM,
|
||||
PAGE_KERNEL);
|
||||
return ret;
|
||||
}
|
||||
|
||||
static void zcomp_lz4_destroy(void *private)
|
||||
{
|
||||
kvfree(private);
|
||||
}
|
||||
|
||||
static int zcomp_lz4_compress(const unsigned char *src, unsigned char *dst,
|
||||
size_t *dst_len, void *private)
|
||||
{
|
||||
/* return : Success if return 0 */
|
||||
return lz4_compress(src, PAGE_SIZE, dst, dst_len, private);
|
||||
}
|
||||
|
||||
static int zcomp_lz4_decompress(const unsigned char *src, size_t src_len,
|
||||
unsigned char *dst)
|
||||
{
|
||||
size_t dst_len = PAGE_SIZE;
|
||||
/* return : Success if return 0 */
|
||||
return lz4_decompress_unknownoutputsize(src, src_len, dst, &dst_len);
|
||||
}
|
||||
|
||||
struct zcomp_backend zcomp_lz4 = {
|
||||
.compress = zcomp_lz4_compress,
|
||||
.decompress = zcomp_lz4_decompress,
|
||||
.create = zcomp_lz4_create,
|
||||
.destroy = zcomp_lz4_destroy,
|
||||
.name = "lz4",
|
||||
};
|
|
@ -1,17 +0,0 @@
|
|||
/*
|
||||
* Copyright (C) 2014 Sergey Senozhatsky.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or
|
||||
* modify it under the terms of the GNU General Public License
|
||||
* as published by the Free Software Foundation; either version
|
||||
* 2 of the License, or (at your option) any later version.
|
||||
*/
|
||||
|
||||
#ifndef _ZCOMP_LZ4_H_
|
||||
#define _ZCOMP_LZ4_H_
|
||||
|
||||
#include "zcomp.h"
|
||||
|
||||
extern struct zcomp_backend zcomp_lz4;
|
||||
|
||||
#endif /* _ZCOMP_LZ4_H_ */
|
|
@ -1,56 +0,0 @@
|
|||
/*
|
||||
* Copyright (C) 2014 Sergey Senozhatsky.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or
|
||||
* modify it under the terms of the GNU General Public License
|
||||
* as published by the Free Software Foundation; either version
|
||||
* 2 of the License, or (at your option) any later version.
|
||||
*/
|
||||
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/lzo.h>
|
||||
#include <linux/vmalloc.h>
|
||||
#include <linux/mm.h>
|
||||
|
||||
#include "zcomp_lzo.h"
|
||||
|
||||
static void *lzo_create(gfp_t flags)
|
||||
{
|
||||
void *ret;
|
||||
|
||||
ret = kmalloc(LZO1X_MEM_COMPRESS, flags);
|
||||
if (!ret)
|
||||
ret = __vmalloc(LZO1X_MEM_COMPRESS,
|
||||
flags | __GFP_HIGHMEM,
|
||||
PAGE_KERNEL);
|
||||
return ret;
|
||||
}
|
||||
|
||||
static void lzo_destroy(void *private)
|
||||
{
|
||||
kvfree(private);
|
||||
}
|
||||
|
||||
static int lzo_compress(const unsigned char *src, unsigned char *dst,
|
||||
size_t *dst_len, void *private)
|
||||
{
|
||||
int ret = lzo1x_1_compress(src, PAGE_SIZE, dst, dst_len, private);
|
||||
return ret == LZO_E_OK ? 0 : ret;
|
||||
}
|
||||
|
||||
static int lzo_decompress(const unsigned char *src, size_t src_len,
|
||||
unsigned char *dst)
|
||||
{
|
||||
size_t dst_len = PAGE_SIZE;
|
||||
int ret = lzo1x_decompress_safe(src, src_len, dst, &dst_len);
|
||||
return ret == LZO_E_OK ? 0 : ret;
|
||||
}
|
||||
|
||||
struct zcomp_backend zcomp_lzo = {
|
||||
.compress = lzo_compress,
|
||||
.decompress = lzo_decompress,
|
||||
.create = lzo_create,
|
||||
.destroy = lzo_destroy,
|
||||
.name = "lzo",
|
||||
};
|
|
@ -1,17 +0,0 @@
|
|||
/*
|
||||
* Copyright (C) 2014 Sergey Senozhatsky.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or
|
||||
* modify it under the terms of the GNU General Public License
|
||||
* as published by the Free Software Foundation; either version
|
||||
* 2 of the License, or (at your option) any later version.
|
||||
*/
|
||||
|
||||
#ifndef _ZCOMP_LZO_H_
|
||||
#define _ZCOMP_LZO_H_
|
||||
|
||||
#include "zcomp.h"
|
||||
|
||||
extern struct zcomp_backend zcomp_lzo;
|
||||
|
||||
#endif /* _ZCOMP_LZO_H_ */
|
|
@ -342,9 +342,16 @@ static ssize_t comp_algorithm_store(struct device *dev,
|
|||
struct device_attribute *attr, const char *buf, size_t len)
|
||||
{
|
||||
struct zram *zram = dev_to_zram(dev);
|
||||
char compressor[CRYPTO_MAX_ALG_NAME];
|
||||
size_t sz;
|
||||
|
||||
if (!zcomp_available_algorithm(buf))
|
||||
strlcpy(compressor, buf, sizeof(compressor));
|
||||
/* ignore trailing newline */
|
||||
sz = strlen(compressor);
|
||||
if (sz > 0 && compressor[sz - 1] == '\n')
|
||||
compressor[sz - 1] = 0x00;
|
||||
|
||||
if (!zcomp_available_algorithm(compressor))
|
||||
return -EINVAL;
|
||||
|
||||
down_write(&zram->init_lock);
|
||||
|
@ -353,13 +360,8 @@ static ssize_t comp_algorithm_store(struct device *dev,
|
|||
pr_info("Can't change algorithm for initialized device\n");
|
||||
return -EBUSY;
|
||||
}
|
||||
strlcpy(zram->compressor, buf, sizeof(zram->compressor));
|
||||
|
||||
/* ignore trailing newline */
|
||||
sz = strlen(zram->compressor);
|
||||
if (sz > 0 && zram->compressor[sz - 1] == '\n')
|
||||
zram->compressor[sz - 1] = 0x00;
|
||||
|
||||
strlcpy(zram->compressor, compressor, sizeof(compressor));
|
||||
up_write(&zram->init_lock);
|
||||
return len;
|
||||
}
|
||||
|
@ -563,7 +565,7 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index)
|
|||
unsigned char *cmem;
|
||||
struct zram_meta *meta = zram->meta;
|
||||
unsigned long handle;
|
||||
size_t size;
|
||||
unsigned int size;
|
||||
|
||||
bit_spin_lock(ZRAM_ACCESS, &meta->table[index].value);
|
||||
handle = meta->table[index].handle;
|
||||
|
@ -576,10 +578,14 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index)
|
|||
}
|
||||
|
||||
cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_RO);
|
||||
if (size == PAGE_SIZE)
|
||||
if (size == PAGE_SIZE) {
|
||||
copy_page(mem, cmem);
|
||||
else
|
||||
ret = zcomp_decompress(zram->comp, cmem, size, mem);
|
||||
} else {
|
||||
struct zcomp_strm *zstrm = zcomp_stream_get(zram->comp);
|
||||
|
||||
ret = zcomp_decompress(zstrm, cmem, size, mem);
|
||||
zcomp_stream_put(zram->comp);
|
||||
}
|
||||
zs_unmap_object(meta->mem_pool, handle);
|
||||
bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value);
|
||||
|
||||
|
@ -646,7 +652,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
|
|||
int offset)
|
||||
{
|
||||
int ret = 0;
|
||||
size_t clen;
|
||||
unsigned int clen;
|
||||
unsigned long handle = 0;
|
||||
struct page *page;
|
||||
unsigned char *user_mem, *cmem, *src, *uncmem = NULL;
|
||||
|
@ -695,8 +701,8 @@ compress_again:
|
|||
goto out;
|
||||
}
|
||||
|
||||
zstrm = zcomp_strm_find(zram->comp);
|
||||
ret = zcomp_compress(zram->comp, zstrm, uncmem, &clen);
|
||||
zstrm = zcomp_stream_get(zram->comp);
|
||||
ret = zcomp_compress(zstrm, uncmem, &clen);
|
||||
if (!is_partial_io(bvec)) {
|
||||
kunmap_atomic(user_mem);
|
||||
user_mem = NULL;
|
||||
|
@ -732,19 +738,21 @@ compress_again:
|
|||
handle = zs_malloc(meta->mem_pool, clen,
|
||||
__GFP_KSWAPD_RECLAIM |
|
||||
__GFP_NOWARN |
|
||||
__GFP_HIGHMEM);
|
||||
__GFP_HIGHMEM |
|
||||
__GFP_MOVABLE);
|
||||
if (!handle) {
|
||||
zcomp_strm_release(zram->comp, zstrm);
|
||||
zcomp_stream_put(zram->comp);
|
||||
zstrm = NULL;
|
||||
|
||||
atomic64_inc(&zram->stats.writestall);
|
||||
|
||||
handle = zs_malloc(meta->mem_pool, clen,
|
||||
GFP_NOIO | __GFP_HIGHMEM);
|
||||
GFP_NOIO | __GFP_HIGHMEM |
|
||||
__GFP_MOVABLE);
|
||||
if (handle)
|
||||
goto compress_again;
|
||||
|
||||
pr_err("Error allocating memory for compressed page: %u, size=%zu\n",
|
||||
pr_err("Error allocating memory for compressed page: %u, size=%u\n",
|
||||
index, clen);
|
||||
ret = -ENOMEM;
|
||||
goto out;
|
||||
|
@ -769,7 +777,7 @@ compress_again:
|
|||
memcpy(cmem, src, clen);
|
||||
}
|
||||
|
||||
zcomp_strm_release(zram->comp, zstrm);
|
||||
zcomp_stream_put(zram->comp);
|
||||
zstrm = NULL;
|
||||
zs_unmap_object(meta->mem_pool, handle);
|
||||
|
||||
|
@ -789,7 +797,7 @@ compress_again:
|
|||
atomic64_inc(&zram->stats.pages_stored);
|
||||
out:
|
||||
if (zstrm)
|
||||
zcomp_strm_release(zram->comp, zstrm);
|
||||
zcomp_stream_put(zram->comp);
|
||||
if (is_partial_io(bvec))
|
||||
kfree(uncmem);
|
||||
return ret;
|
||||
|
|
|
@ -15,8 +15,9 @@
|
|||
#ifndef _ZRAM_DRV_H_
|
||||
#define _ZRAM_DRV_H_
|
||||
|
||||
#include <linux/spinlock.h>
|
||||
#include <linux/rwsem.h>
|
||||
#include <linux/zsmalloc.h>
|
||||
#include <linux/crypto.h>
|
||||
|
||||
#include "zcomp.h"
|
||||
|
||||
|
@ -113,7 +114,7 @@ struct zram {
|
|||
* we can store in a disk.
|
||||
*/
|
||||
u64 disksize; /* bytes */
|
||||
char compressor[10];
|
||||
char compressor[CRYPTO_MAX_ALG_NAME];
|
||||
/*
|
||||
* zram is claimed so open request will be failed
|
||||
*/
|
||||
|
|
|
@ -22,6 +22,7 @@
|
|||
#include <linux/device.h>
|
||||
#include <linux/highmem.h>
|
||||
#include <linux/backing-dev.h>
|
||||
#include <linux/shmem_fs.h>
|
||||
#include <linux/splice.h>
|
||||
#include <linux/pfn.h>
|
||||
#include <linux/export.h>
|
||||
|
@ -657,6 +658,28 @@ static int mmap_zero(struct file *file, struct vm_area_struct *vma)
|
|||
return 0;
|
||||
}
|
||||
|
||||
static unsigned long get_unmapped_area_zero(struct file *file,
|
||||
unsigned long addr, unsigned long len,
|
||||
unsigned long pgoff, unsigned long flags)
|
||||
{
|
||||
#ifdef CONFIG_MMU
|
||||
if (flags & MAP_SHARED) {
|
||||
/*
|
||||
* mmap_zero() will call shmem_zero_setup() to create a file,
|
||||
* so use shmem's get_unmapped_area in case it can be huge;
|
||||
* and pass NULL for file as in mmap.c's get_unmapped_area(),
|
||||
* so as not to confuse shmem with our handle on "/dev/zero".
|
||||
*/
|
||||
return shmem_get_unmapped_area(NULL, addr, len, pgoff, flags);
|
||||
}
|
||||
|
||||
/* Otherwise flags & MAP_PRIVATE: with no shmem object beneath it */
|
||||
return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
|
||||
#else
|
||||
return -ENOSYS;
|
||||
#endif
|
||||
}
|
||||
|
||||
static ssize_t write_full(struct file *file, const char __user *buf,
|
||||
size_t count, loff_t *ppos)
|
||||
{
|
||||
|
@ -764,6 +787,7 @@ static const struct file_operations zero_fops = {
|
|||
.read_iter = read_iter_zero,
|
||||
.write_iter = write_iter_zero,
|
||||
.mmap = mmap_zero,
|
||||
.get_unmapped_area = get_unmapped_area_zero,
|
||||
#ifndef CONFIG_MMU
|
||||
.mmap_capabilities = zero_mmap_capabilities,
|
||||
#endif
|
||||
|
|
|
@ -538,8 +538,7 @@ static void do_fault(struct work_struct *work)
|
|||
if (access_error(vma, fault))
|
||||
goto out;
|
||||
|
||||
ret = handle_mm_fault(mm, vma, address, flags);
|
||||
|
||||
ret = handle_mm_fault(vma, address, flags);
|
||||
out:
|
||||
up_read(&mm->mmap_sem);
|
||||
|
||||
|
|
|
@ -583,7 +583,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
|
|||
if (access_error(vma, req))
|
||||
goto invalid;
|
||||
|
||||
ret = handle_mm_fault(svm->mm, vma, address,
|
||||
ret = handle_mm_fault(vma, address,
|
||||
req->wr_req ? FAULT_FLAG_WRITE : 0);
|
||||
if (ret & VM_FAULT_ERROR)
|
||||
goto invalid;
|
||||
|
|
|
@ -363,6 +363,7 @@ static void moom_callback(struct work_struct *ignored)
|
|||
struct oom_control oc = {
|
||||
.zonelist = node_zonelist(first_memory_node, gfp_mask),
|
||||
.nodemask = NULL,
|
||||
.memcg = NULL,
|
||||
.gfp_mask = gfp_mask,
|
||||
.order = -1,
|
||||
};
|
||||
|
|
|
@ -1496,7 +1496,6 @@ int fb_parse_edid(unsigned char *edid, struct fb_var_screeninfo *var)
|
|||
}
|
||||
void fb_edid_to_monspecs(unsigned char *edid, struct fb_monspecs *specs)
|
||||
{
|
||||
specs = NULL;
|
||||
}
|
||||
void fb_edid_add_monspecs(unsigned char *edid, struct fb_monspecs *specs)
|
||||
{
|
||||
|
|
|
@ -30,6 +30,7 @@
|
|||
#include <linux/oom.h>
|
||||
#include <linux/wait.h>
|
||||
#include <linux/mm.h>
|
||||
#include <linux/mount.h>
|
||||
|
||||
/*
|
||||
* Balloon device works in 4K page units. So each page is pointed to by
|
||||
|
@ -45,6 +46,10 @@ static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
|
|||
module_param(oom_pages, int, S_IRUSR | S_IWUSR);
|
||||
MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
|
||||
|
||||
#ifdef CONFIG_BALLOON_COMPACTION
|
||||
static struct vfsmount *balloon_mnt;
|
||||
#endif
|
||||
|
||||
struct virtio_balloon {
|
||||
struct virtio_device *vdev;
|
||||
struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
|
||||
|
@ -490,6 +495,24 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
|
|||
|
||||
return MIGRATEPAGE_SUCCESS;
|
||||
}
|
||||
|
||||
static struct dentry *balloon_mount(struct file_system_type *fs_type,
|
||||
int flags, const char *dev_name, void *data)
|
||||
{
|
||||
static const struct dentry_operations ops = {
|
||||
.d_dname = simple_dname,
|
||||
};
|
||||
|
||||
return mount_pseudo(fs_type, "balloon-kvm:", NULL, &ops,
|
||||
BALLOON_KVM_MAGIC);
|
||||
}
|
||||
|
||||
static struct file_system_type balloon_fs = {
|
||||
.name = "balloon-kvm",
|
||||
.mount = balloon_mount,
|
||||
.kill_sb = kill_anon_super,
|
||||
};
|
||||
|
||||
#endif /* CONFIG_BALLOON_COMPACTION */
|
||||
|
||||
static int virtballoon_probe(struct virtio_device *vdev)
|
||||
|
@ -519,9 +542,6 @@ static int virtballoon_probe(struct virtio_device *vdev)
|
|||
vb->vdev = vdev;
|
||||
|
||||
balloon_devinfo_init(&vb->vb_dev_info);
|
||||
#ifdef CONFIG_BALLOON_COMPACTION
|
||||
vb->vb_dev_info.migratepage = virtballoon_migratepage;
|
||||
#endif
|
||||
|
||||
err = init_vqs(vb);
|
||||
if (err)
|
||||
|
@ -531,13 +551,33 @@ static int virtballoon_probe(struct virtio_device *vdev)
|
|||
vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
|
||||
err = register_oom_notifier(&vb->nb);
|
||||
if (err < 0)
|
||||
goto out_oom_notify;
|
||||
goto out_del_vqs;
|
||||
|
||||
#ifdef CONFIG_BALLOON_COMPACTION
|
||||
balloon_mnt = kern_mount(&balloon_fs);
|
||||
if (IS_ERR(balloon_mnt)) {
|
||||
err = PTR_ERR(balloon_mnt);
|
||||
unregister_oom_notifier(&vb->nb);
|
||||
goto out_del_vqs;
|
||||
}
|
||||
|
||||
vb->vb_dev_info.migratepage = virtballoon_migratepage;
|
||||
vb->vb_dev_info.inode = alloc_anon_inode(balloon_mnt->mnt_sb);
|
||||
if (IS_ERR(vb->vb_dev_info.inode)) {
|
||||
err = PTR_ERR(vb->vb_dev_info.inode);
|
||||
kern_unmount(balloon_mnt);
|
||||
unregister_oom_notifier(&vb->nb);
|
||||
vb->vb_dev_info.inode = NULL;
|
||||
goto out_del_vqs;
|
||||
}
|
||||
vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops;
|
||||
#endif
|
||||
|
||||
virtio_device_ready(vdev);
|
||||
|
||||
return 0;
|
||||
|
||||
out_oom_notify:
|
||||
out_del_vqs:
|
||||
vdev->config->del_vqs(vdev);
|
||||
out_free_vb:
|
||||
kfree(vb);
|
||||
|
@ -571,6 +611,8 @@ static void virtballoon_remove(struct virtio_device *vdev)
|
|||
cancel_work_sync(&vb->update_balloon_stats_work);
|
||||
|
||||
remove_common(vb);
|
||||
if (vb->vb_dev_info.inode)
|
||||
iput(vb->vb_dev_info.inode);
|
||||
kfree(vb);
|
||||
}
|
||||
|
||||
|
|
|
@ -195,7 +195,7 @@ static void selfballoon_process(struct work_struct *work)
|
|||
MB2PAGES(selfballoon_reserved_mb);
|
||||
#ifdef CONFIG_FRONTSWAP
|
||||
/* allow space for frontswap pages to be repatriated */
|
||||
if (frontswap_selfshrinking && frontswap_enabled)
|
||||
if (frontswap_selfshrinking)
|
||||
goal_pages += frontswap_curr_pages();
|
||||
#endif
|
||||
if (cur_pages > goal_pages)
|
||||
|
@ -230,7 +230,7 @@ static void selfballoon_process(struct work_struct *work)
|
|||
reset_timer = true;
|
||||
}
|
||||
#ifdef CONFIG_FRONTSWAP
|
||||
if (frontswap_selfshrinking && frontswap_enabled) {
|
||||
if (frontswap_selfshrinking) {
|
||||
frontswap_selfshrink();
|
||||
reset_timer = true;
|
||||
}
|
||||
|
|
|
@ -4178,7 +4178,8 @@ int extent_readpages(struct extent_io_tree *tree,
|
|||
prefetchw(&page->flags);
|
||||
list_del(&page->lru);
|
||||
if (add_to_page_cache_lru(page, mapping,
|
||||
page->index, GFP_NOFS)) {
|
||||
page->index,
|
||||
readahead_gfp_mask(mapping))) {
|
||||
put_page(page);
|
||||
continue;
|
||||
}
|
||||
|
|
|
@ -3366,7 +3366,7 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
|
|||
struct page *page, *tpage;
|
||||
unsigned int expected_index;
|
||||
int rc;
|
||||
gfp_t gfp = mapping_gfp_constraint(mapping, GFP_KERNEL);
|
||||
gfp_t gfp = readahead_gfp_mask(mapping);
|
||||
|
||||
INIT_LIST_HEAD(tmplist);
|
||||
|
||||
|
|
73
fs/dax.c
73
fs/dax.c
|
@ -819,16 +819,16 @@ static int dax_insert_mapping(struct address_space *mapping,
|
|||
}
|
||||
|
||||
/**
|
||||
* __dax_fault - handle a page fault on a DAX file
|
||||
* dax_fault - handle a page fault on a DAX file
|
||||
* @vma: The virtual memory area where the fault occurred
|
||||
* @vmf: The description of the fault
|
||||
* @get_block: The filesystem method used to translate file offsets to blocks
|
||||
*
|
||||
* When a page fault occurs, filesystems may call this helper in their
|
||||
* fault handler for DAX files. __dax_fault() assumes the caller has done all
|
||||
* fault handler for DAX files. dax_fault() assumes the caller has done all
|
||||
* the necessary locking for the page fault to proceed successfully.
|
||||
*/
|
||||
int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
|
||||
int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
|
||||
get_block_t get_block)
|
||||
{
|
||||
struct file *file = vma->vm_file;
|
||||
|
@ -913,33 +913,6 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
|
|||
return VM_FAULT_SIGBUS | major;
|
||||
return VM_FAULT_NOPAGE | major;
|
||||
}
|
||||
EXPORT_SYMBOL(__dax_fault);
|
||||
|
||||
/**
|
||||
* dax_fault - handle a page fault on a DAX file
|
||||
* @vma: The virtual memory area where the fault occurred
|
||||
* @vmf: The description of the fault
|
||||
* @get_block: The filesystem method used to translate file offsets to blocks
|
||||
*
|
||||
* When a page fault occurs, filesystems may call this helper in their
|
||||
* fault handler for DAX files.
|
||||
*/
|
||||
int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
|
||||
get_block_t get_block)
|
||||
{
|
||||
int result;
|
||||
struct super_block *sb = file_inode(vma->vm_file)->i_sb;
|
||||
|
||||
if (vmf->flags & FAULT_FLAG_WRITE) {
|
||||
sb_start_pagefault(sb);
|
||||
file_update_time(vma->vm_file);
|
||||
}
|
||||
result = __dax_fault(vma, vmf, get_block);
|
||||
if (vmf->flags & FAULT_FLAG_WRITE)
|
||||
sb_end_pagefault(sb);
|
||||
|
||||
return result;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(dax_fault);
|
||||
|
||||
#if defined(CONFIG_TRANSPARENT_HUGEPAGE)
|
||||
|
@ -967,7 +940,16 @@ static void __dax_dbg(struct buffer_head *bh, unsigned long address,
|
|||
|
||||
#define dax_pmd_dbg(bh, address, reason) __dax_dbg(bh, address, reason, "dax_pmd")
|
||||
|
||||
int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
|
||||
/**
|
||||
* dax_pmd_fault - handle a PMD fault on a DAX file
|
||||
* @vma: The virtual memory area where the fault occurred
|
||||
* @vmf: The description of the fault
|
||||
* @get_block: The filesystem method used to translate file offsets to blocks
|
||||
*
|
||||
* When a page fault occurs, filesystems may call this helper in their
|
||||
* pmd_fault handler for DAX files.
|
||||
*/
|
||||
int dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
|
||||
pmd_t *pmd, unsigned int flags, get_block_t get_block)
|
||||
{
|
||||
struct file *file = vma->vm_file;
|
||||
|
@ -1119,7 +1101,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
|
|||
*
|
||||
* The PMD path doesn't have an equivalent to
|
||||
* dax_pfn_mkwrite(), though, so for a read followed by a
|
||||
* write we traverse all the way through __dax_pmd_fault()
|
||||
* write we traverse all the way through dax_pmd_fault()
|
||||
* twice. This means we can just skip inserting a radix tree
|
||||
* entry completely on the initial read and just wait until
|
||||
* the write to insert a dirty entry.
|
||||
|
@ -1148,33 +1130,6 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
|
|||
result = VM_FAULT_FALLBACK;
|
||||
goto out;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__dax_pmd_fault);
|
||||
|
||||
/**
|
||||
* dax_pmd_fault - handle a PMD fault on a DAX file
|
||||
* @vma: The virtual memory area where the fault occurred
|
||||
* @vmf: The description of the fault
|
||||
* @get_block: The filesystem method used to translate file offsets to blocks
|
||||
*
|
||||
* When a page fault occurs, filesystems may call this helper in their
|
||||
* pmd_fault handler for DAX files.
|
||||
*/
|
||||
int dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
|
||||
pmd_t *pmd, unsigned int flags, get_block_t get_block)
|
||||
{
|
||||
int result;
|
||||
struct super_block *sb = file_inode(vma->vm_file)->i_sb;
|
||||
|
||||
if (flags & FAULT_FLAG_WRITE) {
|
||||
sb_start_pagefault(sb);
|
||||
file_update_time(vma->vm_file);
|
||||
}
|
||||
result = __dax_pmd_fault(vma, address, pmd, flags, get_block);
|
||||
if (flags & FAULT_FLAG_WRITE)
|
||||
sb_end_pagefault(sb);
|
||||
|
||||
return result;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(dax_pmd_fault);
|
||||
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
|
||||
|
||||
|
|
|
@ -51,7 +51,7 @@ static int ext2_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
|
|||
}
|
||||
down_read(&ei->dax_sem);
|
||||
|
||||
ret = __dax_fault(vma, vmf, ext2_get_block);
|
||||
ret = dax_fault(vma, vmf, ext2_get_block);
|
||||
|
||||
up_read(&ei->dax_sem);
|
||||
if (vmf->flags & FAULT_FLAG_WRITE)
|
||||
|
@ -72,7 +72,7 @@ static int ext2_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
|
|||
}
|
||||
down_read(&ei->dax_sem);
|
||||
|
||||
ret = __dax_pmd_fault(vma, addr, pmd, flags, ext2_get_block);
|
||||
ret = dax_pmd_fault(vma, addr, pmd, flags, ext2_get_block);
|
||||
|
||||
up_read(&ei->dax_sem);
|
||||
if (flags & FAULT_FLAG_WRITE)
|
||||
|
|
|
@ -202,7 +202,7 @@ static int ext4_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
|
|||
if (IS_ERR(handle))
|
||||
result = VM_FAULT_SIGBUS;
|
||||
else
|
||||
result = __dax_fault(vma, vmf, ext4_dax_get_block);
|
||||
result = dax_fault(vma, vmf, ext4_dax_get_block);
|
||||
|
||||
if (write) {
|
||||
if (!IS_ERR(handle))
|
||||
|
@ -237,7 +237,7 @@ static int ext4_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
|
|||
if (IS_ERR(handle))
|
||||
result = VM_FAULT_SIGBUS;
|
||||
else
|
||||
result = __dax_pmd_fault(vma, addr, pmd, flags,
|
||||
result = dax_pmd_fault(vma, addr, pmd, flags,
|
||||
ext4_dax_get_block);
|
||||
|
||||
if (write) {
|
||||
|
|
|
@ -130,7 +130,7 @@ int ext4_mpage_readpages(struct address_space *mapping,
|
|||
page = list_entry(pages->prev, struct page, lru);
|
||||
list_del(&page->lru);
|
||||
if (add_to_page_cache_lru(page, mapping, page->index,
|
||||
mapping_gfp_constraint(mapping, GFP_KERNEL)))
|
||||
readahead_gfp_mask(mapping)))
|
||||
goto next_page;
|
||||
}
|
||||
|
||||
|
|
|
@ -1002,7 +1002,8 @@ static int f2fs_mpage_readpages(struct address_space *mapping,
|
|||
page = list_entry(pages->prev, struct page, lru);
|
||||
list_del(&page->lru);
|
||||
if (add_to_page_cache_lru(page, mapping,
|
||||
page->index, GFP_KERNEL))
|
||||
page->index,
|
||||
readahead_gfp_mask(mapping)))
|
||||
goto next_page;
|
||||
}
|
||||
|
||||
|
|
|
@ -980,6 +980,42 @@ void inode_io_list_del(struct inode *inode)
|
|||
spin_unlock(&wb->list_lock);
|
||||
}
|
||||
|
||||
/*
|
||||
* mark an inode as under writeback on the sb
|
||||
*/
|
||||
void sb_mark_inode_writeback(struct inode *inode)
|
||||
{
|
||||
struct super_block *sb = inode->i_sb;
|
||||
unsigned long flags;
|
||||
|
||||
if (list_empty(&inode->i_wb_list)) {
|
||||
spin_lock_irqsave(&sb->s_inode_wblist_lock, flags);
|
||||
if (list_empty(&inode->i_wb_list)) {
|
||||
list_add_tail(&inode->i_wb_list, &sb->s_inodes_wb);
|
||||
trace_sb_mark_inode_writeback(inode);
|
||||
}
|
||||
spin_unlock_irqrestore(&sb->s_inode_wblist_lock, flags);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* clear an inode as under writeback on the sb
|
||||
*/
|
||||
void sb_clear_inode_writeback(struct inode *inode)
|
||||
{
|
||||
struct super_block *sb = inode->i_sb;
|
||||
unsigned long flags;
|
||||
|
||||
if (!list_empty(&inode->i_wb_list)) {
|
||||
spin_lock_irqsave(&sb->s_inode_wblist_lock, flags);
|
||||
if (!list_empty(&inode->i_wb_list)) {
|
||||
list_del_init(&inode->i_wb_list);
|
||||
trace_sb_clear_inode_writeback(inode);
|
||||
}
|
||||
spin_unlock_irqrestore(&sb->s_inode_wblist_lock, flags);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Redirty an inode: set its when-it-was dirtied timestamp and move it to the
|
||||
* furthest end of its superblock's dirty-inode list.
|
||||
|
@ -2154,7 +2190,7 @@ EXPORT_SYMBOL(__mark_inode_dirty);
|
|||
*/
|
||||
static void wait_sb_inodes(struct super_block *sb)
|
||||
{
|
||||
struct inode *inode, *old_inode = NULL;
|
||||
LIST_HEAD(sync_list);
|
||||
|
||||
/*
|
||||
* We need to be protected against the filesystem going from
|
||||
|
@ -2163,38 +2199,60 @@ static void wait_sb_inodes(struct super_block *sb)
|
|||
WARN_ON(!rwsem_is_locked(&sb->s_umount));
|
||||
|
||||
mutex_lock(&sb->s_sync_lock);
|
||||
spin_lock(&sb->s_inode_list_lock);
|
||||
|
||||
/*
|
||||
* Data integrity sync. Must wait for all pages under writeback,
|
||||
* because there may have been pages dirtied before our sync
|
||||
* call, but which had writeout started before we write it out.
|
||||
* In which case, the inode may not be on the dirty list, but
|
||||
* we still have to wait for that writeout.
|
||||
* Splice the writeback list onto a temporary list to avoid waiting on
|
||||
* inodes that have started writeback after this point.
|
||||
*
|
||||
* Use rcu_read_lock() to keep the inodes around until we have a
|
||||
* reference. s_inode_wblist_lock protects sb->s_inodes_wb as well as
|
||||
* the local list because inodes can be dropped from either by writeback
|
||||
* completion.
|
||||
*/
|
||||
list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
|
||||
rcu_read_lock();
|
||||
spin_lock_irq(&sb->s_inode_wblist_lock);
|
||||
list_splice_init(&sb->s_inodes_wb, &sync_list);
|
||||
|
||||
/*
|
||||
* Data integrity sync. Must wait for all pages under writeback, because
|
||||
* there may have been pages dirtied before our sync call, but which had
|
||||
* writeout started before we write it out. In which case, the inode
|
||||
* may not be on the dirty list, but we still have to wait for that
|
||||
* writeout.
|
||||
*/
|
||||
while (!list_empty(&sync_list)) {
|
||||
struct inode *inode = list_first_entry(&sync_list, struct inode,
|
||||
i_wb_list);
|
||||
struct address_space *mapping = inode->i_mapping;
|
||||
|
||||
/*
|
||||
* Move each inode back to the wb list before we drop the lock
|
||||
* to preserve consistency between i_wb_list and the mapping
|
||||
* writeback tag. Writeback completion is responsible to remove
|
||||
* the inode from either list once the writeback tag is cleared.
|
||||
*/
|
||||
list_move_tail(&inode->i_wb_list, &sb->s_inodes_wb);
|
||||
|
||||
/*
|
||||
* The mapping can appear untagged while still on-list since we
|
||||
* do not have the mapping lock. Skip it here, wb completion
|
||||
* will remove it.
|
||||
*/
|
||||
if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
|
||||
continue;
|
||||
|
||||
spin_unlock_irq(&sb->s_inode_wblist_lock);
|
||||
|
||||
spin_lock(&inode->i_lock);
|
||||
if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
|
||||
(mapping->nrpages == 0)) {
|
||||
if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
|
||||
spin_unlock(&inode->i_lock);
|
||||
|
||||
spin_lock_irq(&sb->s_inode_wblist_lock);
|
||||
continue;
|
||||
}
|
||||
__iget(inode);
|
||||
spin_unlock(&inode->i_lock);
|
||||
spin_unlock(&sb->s_inode_list_lock);
|
||||
|
||||
/*
|
||||
* We hold a reference to 'inode' so it couldn't have been
|
||||
* removed from s_inodes list while we dropped the
|
||||
* s_inode_list_lock. We cannot iput the inode now as we can
|
||||
* be holding the last reference and we cannot iput it under
|
||||
* s_inode_list_lock. So we keep the reference and iput it
|
||||
* later.
|
||||
*/
|
||||
iput(old_inode);
|
||||
old_inode = inode;
|
||||
rcu_read_unlock();
|
||||
|
||||
/*
|
||||
* We keep the error status of individual mapping so that
|
||||
|
@ -2205,10 +2263,13 @@ static void wait_sb_inodes(struct super_block *sb)
|
|||
|
||||
cond_resched();
|
||||
|
||||
spin_lock(&sb->s_inode_list_lock);
|
||||
iput(inode);
|
||||
|
||||
rcu_read_lock();
|
||||
spin_lock_irq(&sb->s_inode_wblist_lock);
|
||||
}
|
||||
spin_unlock(&sb->s_inode_list_lock);
|
||||
iput(old_inode);
|
||||
spin_unlock_irq(&sb->s_inode_wblist_lock);
|
||||
rcu_read_unlock();
|
||||
mutex_unlock(&sb->s_sync_lock);
|
||||
}
|
||||
|
||||
|
|
|
@ -365,6 +365,7 @@ void inode_init_once(struct inode *inode)
|
|||
INIT_HLIST_NODE(&inode->i_hash);
|
||||
INIT_LIST_HEAD(&inode->i_devices);
|
||||
INIT_LIST_HEAD(&inode->i_io_list);
|
||||
INIT_LIST_HEAD(&inode->i_wb_list);
|
||||
INIT_LIST_HEAD(&inode->i_lru);
|
||||
address_space_init_once(&inode->i_data);
|
||||
i_size_ordered_init(inode);
|
||||
|
@ -507,6 +508,7 @@ void clear_inode(struct inode *inode)
|
|||
BUG_ON(!list_empty(&inode->i_data.private_list));
|
||||
BUG_ON(!(inode->i_state & I_FREEING));
|
||||
BUG_ON(inode->i_state & I_CLEAR);
|
||||
BUG_ON(!list_empty(&inode->i_wb_list));
|
||||
/* don't need i_lock here, no concurrent mods to i_state */
|
||||
inode->i_state = I_FREEING | I_CLEAR;
|
||||
}
|
||||
|
|
|
@ -72,6 +72,8 @@ mpage_alloc(struct block_device *bdev,
|
|||
{
|
||||
struct bio *bio;
|
||||
|
||||
/* Restrict the given (page cache) mask for slab allocations */
|
||||
gfp_flags &= GFP_KERNEL;
|
||||
bio = bio_alloc(gfp_flags, nr_vecs);
|
||||
|
||||
if (bio == NULL && (current->flags & PF_MEMALLOC)) {
|
||||
|
@ -363,7 +365,7 @@ mpage_readpages(struct address_space *mapping, struct list_head *pages,
|
|||
sector_t last_block_in_bio = 0;
|
||||
struct buffer_head map_bh;
|
||||
unsigned long first_logical_block = 0;
|
||||
gfp_t gfp = mapping_gfp_constraint(mapping, GFP_KERNEL);
|
||||
gfp_t gfp = readahead_gfp_mask(mapping);
|
||||
|
||||
map_bh.b_state = 0;
|
||||
map_bh.b_size = 0;
|
||||
|
|
|
@ -1618,16 +1618,12 @@ static void o2net_start_connect(struct work_struct *work)
|
|||
|
||||
/* watch for racing with tearing a node down */
|
||||
node = o2nm_get_node_by_num(o2net_num_from_nn(nn));
|
||||
if (node == NULL) {
|
||||
ret = 0;
|
||||
if (node == NULL)
|
||||
goto out;
|
||||
}
|
||||
|
||||
mynode = o2nm_get_node_by_num(o2nm_this_node());
|
||||
if (mynode == NULL) {
|
||||
ret = 0;
|
||||
if (mynode == NULL)
|
||||
goto out;
|
||||
}
|
||||
|
||||
spin_lock(&nn->nn_lock);
|
||||
/*
|
||||
|
|
|
@ -347,26 +347,6 @@ static struct dentry *dlm_debugfs_root;
|
|||
#define DLM_DEBUGFS_PURGE_LIST "purge_list"
|
||||
|
||||
/* begin - utils funcs */
|
||||
static void dlm_debug_free(struct kref *kref)
|
||||
{
|
||||
struct dlm_debug_ctxt *dc;
|
||||
|
||||
dc = container_of(kref, struct dlm_debug_ctxt, debug_refcnt);
|
||||
|
||||
kfree(dc);
|
||||
}
|
||||
|
||||
static void dlm_debug_put(struct dlm_debug_ctxt *dc)
|
||||
{
|
||||
if (dc)
|
||||
kref_put(&dc->debug_refcnt, dlm_debug_free);
|
||||
}
|
||||
|
||||
static void dlm_debug_get(struct dlm_debug_ctxt *dc)
|
||||
{
|
||||
kref_get(&dc->debug_refcnt);
|
||||
}
|
||||
|
||||
static int debug_release(struct inode *inode, struct file *file)
|
||||
{
|
||||
free_page((unsigned long)file->private_data);
|
||||
|
@ -932,11 +912,9 @@ int dlm_debug_init(struct dlm_ctxt *dlm)
|
|||
goto bail;
|
||||
}
|
||||
|
||||
dlm_debug_get(dc);
|
||||
return 0;
|
||||
|
||||
bail:
|
||||
dlm_debug_shutdown(dlm);
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
|
@ -949,7 +927,8 @@ void dlm_debug_shutdown(struct dlm_ctxt *dlm)
|
|||
debugfs_remove(dc->debug_mle_dentry);
|
||||
debugfs_remove(dc->debug_lockres_dentry);
|
||||
debugfs_remove(dc->debug_state_dentry);
|
||||
dlm_debug_put(dc);
|
||||
kfree(dc);
|
||||
dc = NULL;
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -969,7 +948,6 @@ int dlm_create_debugfs_subroot(struct dlm_ctxt *dlm)
|
|||
mlog_errno(-ENOMEM);
|
||||
goto bail;
|
||||
}
|
||||
kref_init(&dlm->dlm_debug_ctxt->debug_refcnt);
|
||||
|
||||
return 0;
|
||||
bail:
|
||||
|
|
|
@ -30,7 +30,6 @@ void dlm_print_one_mle(struct dlm_master_list_entry *mle);
|
|||
#ifdef CONFIG_DEBUG_FS
|
||||
|
||||
struct dlm_debug_ctxt {
|
||||
struct kref debug_refcnt;
|
||||
struct dentry *debug_state_dentry;
|
||||
struct dentry *debug_lockres_dentry;
|
||||
struct dentry *debug_mle_dentry;
|
||||
|
|
|
@ -1635,7 +1635,6 @@ int ocfs2_create_new_inode_locks(struct inode *inode)
|
|||
int ret;
|
||||
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
|
||||
|
||||
BUG_ON(!inode);
|
||||
BUG_ON(!ocfs2_inode_is_new(inode));
|
||||
|
||||
mlog(0, "Inode %llu\n", (unsigned long long)OCFS2_I(inode)->ip_blkno);
|
||||
|
@ -1665,10 +1664,8 @@ int ocfs2_create_new_inode_locks(struct inode *inode)
|
|||
}
|
||||
|
||||
ret = ocfs2_create_new_lock(osb, &OCFS2_I(inode)->ip_open_lockres, 0, 0);
|
||||
if (ret) {
|
||||
if (ret)
|
||||
mlog_errno(ret);
|
||||
goto bail;
|
||||
}
|
||||
|
||||
bail:
|
||||
return ret;
|
||||
|
@ -1680,8 +1677,6 @@ int ocfs2_rw_lock(struct inode *inode, int write)
|
|||
struct ocfs2_lock_res *lockres;
|
||||
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
|
||||
|
||||
BUG_ON(!inode);
|
||||
|
||||
mlog(0, "inode %llu take %s RW lock\n",
|
||||
(unsigned long long)OCFS2_I(inode)->ip_blkno,
|
||||
write ? "EXMODE" : "PRMODE");
|
||||
|
@ -1724,8 +1719,6 @@ int ocfs2_open_lock(struct inode *inode)
|
|||
struct ocfs2_lock_res *lockres;
|
||||
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
|
||||
|
||||
BUG_ON(!inode);
|
||||
|
||||
mlog(0, "inode %llu take PRMODE open lock\n",
|
||||
(unsigned long long)OCFS2_I(inode)->ip_blkno);
|
||||
|
||||
|
@ -1749,8 +1742,6 @@ int ocfs2_try_open_lock(struct inode *inode, int write)
|
|||
struct ocfs2_lock_res *lockres;
|
||||
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
|
||||
|
||||
BUG_ON(!inode);
|
||||
|
||||
mlog(0, "inode %llu try to take %s open lock\n",
|
||||
(unsigned long long)OCFS2_I(inode)->ip_blkno,
|
||||
write ? "EXMODE" : "PRMODE");
|
||||
|
@ -2328,8 +2319,6 @@ int ocfs2_inode_lock_full_nested(struct inode *inode,
|
|||
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
|
||||
struct buffer_head *local_bh = NULL;
|
||||
|
||||
BUG_ON(!inode);
|
||||
|
||||
mlog(0, "inode %llu, take %s META lock\n",
|
||||
(unsigned long long)OCFS2_I(inode)->ip_blkno,
|
||||
ex ? "EXMODE" : "PRMODE");
|
||||
|
|
|
@ -145,22 +145,15 @@ int ocfs2_drop_inode(struct inode *inode);
|
|||
struct inode *ocfs2_ilookup(struct super_block *sb, u64 feoff);
|
||||
struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 feoff, unsigned flags,
|
||||
int sysfile_type);
|
||||
int ocfs2_inode_init_private(struct inode *inode);
|
||||
int ocfs2_inode_revalidate(struct dentry *dentry);
|
||||
void ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe,
|
||||
int create_ino);
|
||||
void ocfs2_read_inode(struct inode *inode);
|
||||
void ocfs2_read_inode2(struct inode *inode, void *opaque);
|
||||
ssize_t ocfs2_rw_direct(int rw, struct file *filp, char *buf,
|
||||
size_t size, loff_t *offp);
|
||||
void ocfs2_sync_blockdev(struct super_block *sb);
|
||||
void ocfs2_refresh_inode(struct inode *inode,
|
||||
struct ocfs2_dinode *fe);
|
||||
int ocfs2_mark_inode_dirty(handle_t *handle,
|
||||
struct inode *inode,
|
||||
struct buffer_head *bh);
|
||||
struct buffer_head *ocfs2_bread(struct inode *inode,
|
||||
int block, int *err, int reada);
|
||||
|
||||
void ocfs2_set_inode_flags(struct inode *inode);
|
||||
void ocfs2_get_inode_flags(struct ocfs2_inode_info *oi);
|
||||
|
|
|
@ -1159,10 +1159,8 @@ static int ocfs2_force_read_journal(struct inode *inode)
|
|||
int status = 0;
|
||||
int i;
|
||||
u64 v_blkno, p_blkno, p_blocks, num_blocks;
|
||||
#define CONCURRENT_JOURNAL_FILL 32ULL
|
||||
struct buffer_head *bhs[CONCURRENT_JOURNAL_FILL];
|
||||
|
||||
memset(bhs, 0, sizeof(struct buffer_head *) * CONCURRENT_JOURNAL_FILL);
|
||||
struct buffer_head *bh = NULL;
|
||||
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
|
||||
|
||||
num_blocks = ocfs2_blocks_for_bytes(inode->i_sb, i_size_read(inode));
|
||||
v_blkno = 0;
|
||||
|
@ -1174,29 +1172,32 @@ static int ocfs2_force_read_journal(struct inode *inode)
|
|||
goto bail;
|
||||
}
|
||||
|
||||
if (p_blocks > CONCURRENT_JOURNAL_FILL)
|
||||
p_blocks = CONCURRENT_JOURNAL_FILL;
|
||||
for (i = 0; i < p_blocks; i++, p_blkno++) {
|
||||
bh = __find_get_block(osb->sb->s_bdev, p_blkno,
|
||||
osb->sb->s_blocksize);
|
||||
/* block not cached. */
|
||||
if (!bh)
|
||||
continue;
|
||||
|
||||
/* We are reading journal data which should not
|
||||
* be put in the uptodate cache */
|
||||
status = ocfs2_read_blocks_sync(OCFS2_SB(inode->i_sb),
|
||||
p_blkno, p_blocks, bhs);
|
||||
if (status < 0) {
|
||||
mlog_errno(status);
|
||||
goto bail;
|
||||
}
|
||||
brelse(bh);
|
||||
bh = NULL;
|
||||
/* We are reading journal data which should not
|
||||
* be put in the uptodate cache.
|
||||
*/
|
||||
status = ocfs2_read_blocks_sync(osb, p_blkno, 1, &bh);
|
||||
if (status < 0) {
|
||||
mlog_errno(status);
|
||||
goto bail;
|
||||
}
|
||||
|
||||
for(i = 0; i < p_blocks; i++) {
|
||||
brelse(bhs[i]);
|
||||
bhs[i] = NULL;
|
||||
brelse(bh);
|
||||
bh = NULL;
|
||||
}
|
||||
|
||||
v_blkno += p_blocks;
|
||||
}
|
||||
|
||||
bail:
|
||||
for(i = 0; i < CONCURRENT_JOURNAL_FILL; i++)
|
||||
brelse(bhs[i]);
|
||||
return status;
|
||||
}
|
||||
|
||||
|
|
|
@ -735,8 +735,6 @@ static void __exit ocfs2_stack_glue_exit(void)
|
|||
{
|
||||
memset(&locking_max_version, 0,
|
||||
sizeof(struct ocfs2_protocol_version));
|
||||
locking_max_version.pv_major = 0;
|
||||
locking_max_version.pv_minor = 0;
|
||||
ocfs2_sysfs_exit();
|
||||
if (ocfs2_table_header)
|
||||
unregister_sysctl_table(ocfs2_table_header);
|
||||
|
|
|
@ -2072,7 +2072,6 @@ static int ocfs2_initialize_super(struct super_block *sb,
|
|||
osb->osb_dx_seed[3] = le32_to_cpu(di->id2.i_super.s_uuid_hash);
|
||||
|
||||
osb->sb = sb;
|
||||
/* Save off for ocfs2_rw_direct */
|
||||
osb->s_sectsize_bits = blksize_bits(sector_size);
|
||||
BUG_ON(!osb->s_sectsize_bits);
|
||||
|
||||
|
|
|
@ -80,7 +80,7 @@ static int orangefs_readpages(struct file *file,
|
|||
if (!add_to_page_cache(page,
|
||||
mapping,
|
||||
page->index,
|
||||
GFP_KERNEL)) {
|
||||
readahead_gfp_mask(mapping))) {
|
||||
ret = read_one_page(page);
|
||||
gossip_debug(GOSSIP_INODE_DEBUG,
|
||||
"failure adding page to cache, read_one_page returned: %d\n",
|
||||
|
|
32
fs/pipe.c
32
fs/pipe.c
|
@ -21,6 +21,7 @@
|
|||
#include <linux/audit.h>
|
||||
#include <linux/syscalls.h>
|
||||
#include <linux/fcntl.h>
|
||||
#include <linux/memcontrol.h>
|
||||
|
||||
#include <asm/uaccess.h>
|
||||
#include <asm/ioctls.h>
|
||||
|
@ -137,6 +138,22 @@ static void anon_pipe_buf_release(struct pipe_inode_info *pipe,
|
|||
put_page(page);
|
||||
}
|
||||
|
||||
static int anon_pipe_buf_steal(struct pipe_inode_info *pipe,
|
||||
struct pipe_buffer *buf)
|
||||
{
|
||||
struct page *page = buf->page;
|
||||
|
||||
if (page_count(page) == 1) {
|
||||
if (memcg_kmem_enabled()) {
|
||||
memcg_kmem_uncharge(page, 0);
|
||||
__ClearPageKmemcg(page);
|
||||
}
|
||||
__SetPageLocked(page);
|
||||
return 0;
|
||||
}
|
||||
return 1;
|
||||
}
|
||||
|
||||
/**
|
||||
* generic_pipe_buf_steal - attempt to take ownership of a &pipe_buffer
|
||||
* @pipe: the pipe that the buffer belongs to
|
||||
|
@ -219,7 +236,7 @@ static const struct pipe_buf_operations anon_pipe_buf_ops = {
|
|||
.can_merge = 1,
|
||||
.confirm = generic_pipe_buf_confirm,
|
||||
.release = anon_pipe_buf_release,
|
||||
.steal = generic_pipe_buf_steal,
|
||||
.steal = anon_pipe_buf_steal,
|
||||
.get = generic_pipe_buf_get,
|
||||
};
|
||||
|
||||
|
@ -227,7 +244,7 @@ static const struct pipe_buf_operations packet_pipe_buf_ops = {
|
|||
.can_merge = 0,
|
||||
.confirm = generic_pipe_buf_confirm,
|
||||
.release = anon_pipe_buf_release,
|
||||
.steal = generic_pipe_buf_steal,
|
||||
.steal = anon_pipe_buf_steal,
|
||||
.get = generic_pipe_buf_get,
|
||||
};
|
||||
|
||||
|
@ -405,7 +422,7 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
|
|||
int copied;
|
||||
|
||||
if (!page) {
|
||||
page = alloc_page(GFP_HIGHUSER);
|
||||
page = alloc_page(GFP_HIGHUSER | __GFP_ACCOUNT);
|
||||
if (unlikely(!page)) {
|
||||
ret = ret ? : -ENOMEM;
|
||||
break;
|
||||
|
@ -611,7 +628,7 @@ struct pipe_inode_info *alloc_pipe_info(void)
|
|||
{
|
||||
struct pipe_inode_info *pipe;
|
||||
|
||||
pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL);
|
||||
pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT);
|
||||
if (pipe) {
|
||||
unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
|
||||
struct user_struct *user = get_current_user();
|
||||
|
@ -619,7 +636,9 @@ struct pipe_inode_info *alloc_pipe_info(void)
|
|||
if (!too_many_pipe_buffers_hard(user)) {
|
||||
if (too_many_pipe_buffers_soft(user))
|
||||
pipe_bufs = 1;
|
||||
pipe->bufs = kzalloc(sizeof(struct pipe_buffer) * pipe_bufs, GFP_KERNEL);
|
||||
pipe->bufs = kcalloc(pipe_bufs,
|
||||
sizeof(struct pipe_buffer),
|
||||
GFP_KERNEL_ACCOUNT);
|
||||
}
|
||||
|
||||
if (pipe->bufs) {
|
||||
|
@ -1010,7 +1029,8 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long nr_pages)
|
|||
if (nr_pages < pipe->nrbufs)
|
||||
return -EBUSY;
|
||||
|
||||
bufs = kcalloc(nr_pages, sizeof(*bufs), GFP_KERNEL | __GFP_NOWARN);
|
||||
bufs = kcalloc(nr_pages, sizeof(*bufs),
|
||||
GFP_KERNEL_ACCOUNT | __GFP_NOWARN);
|
||||
if (unlikely(!bufs))
|
||||
return -ENOMEM;
|
||||
|
||||
|
|
|
@ -105,6 +105,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
|
|||
#endif
|
||||
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
||||
"AnonHugePages: %8lu kB\n"
|
||||
"ShmemHugePages: %8lu kB\n"
|
||||
"ShmemPmdMapped: %8lu kB\n"
|
||||
#endif
|
||||
#ifdef CONFIG_CMA
|
||||
"CmaTotal: %8lu kB\n"
|
||||
|
@ -162,8 +164,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
|
|||
, atomic_long_read(&num_poisoned_pages) << (PAGE_SHIFT - 10)
|
||||
#endif
|
||||
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
||||
, K(global_page_state(NR_ANON_TRANSPARENT_HUGEPAGES) *
|
||||
HPAGE_PMD_NR)
|
||||
, K(global_page_state(NR_ANON_THPS) * HPAGE_PMD_NR)
|
||||
, K(global_page_state(NR_SHMEM_THPS) * HPAGE_PMD_NR)
|
||||
, K(global_page_state(NR_SHMEM_PMDMAPPED) * HPAGE_PMD_NR)
|
||||
#endif
|
||||
#ifdef CONFIG_CMA
|
||||
, K(totalcma_pages)
|
||||
|
|
|
@ -448,6 +448,7 @@ struct mem_size_stats {
|
|||
unsigned long referenced;
|
||||
unsigned long anonymous;
|
||||
unsigned long anonymous_thp;
|
||||
unsigned long shmem_thp;
|
||||
unsigned long swap;
|
||||
unsigned long shared_hugetlb;
|
||||
unsigned long private_hugetlb;
|
||||
|
@ -576,7 +577,12 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
|
|||
page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
|
||||
if (IS_ERR_OR_NULL(page))
|
||||
return;
|
||||
mss->anonymous_thp += HPAGE_PMD_SIZE;
|
||||
if (PageAnon(page))
|
||||
mss->anonymous_thp += HPAGE_PMD_SIZE;
|
||||
else if (PageSwapBacked(page))
|
||||
mss->shmem_thp += HPAGE_PMD_SIZE;
|
||||
else
|
||||
VM_BUG_ON_PAGE(1, page);
|
||||
smaps_account(mss, page, true, pmd_young(*pmd), pmd_dirty(*pmd));
|
||||
}
|
||||
#else
|
||||
|
@ -770,6 +776,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
|
|||
"Referenced: %8lu kB\n"
|
||||
"Anonymous: %8lu kB\n"
|
||||
"AnonHugePages: %8lu kB\n"
|
||||
"ShmemPmdMapped: %8lu kB\n"
|
||||
"Shared_Hugetlb: %8lu kB\n"
|
||||
"Private_Hugetlb: %7lu kB\n"
|
||||
"Swap: %8lu kB\n"
|
||||
|
@ -787,6 +794,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
|
|||
mss.referenced >> 10,
|
||||
mss.anonymous >> 10,
|
||||
mss.anonymous_thp >> 10,
|
||||
mss.shmem_thp >> 10,
|
||||
mss.shared_hugetlb >> 10,
|
||||
mss.private_hugetlb >> 10,
|
||||
mss.swap >> 10,
|
||||
|
|
|
@ -206,6 +206,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
|
|||
mutex_init(&s->s_sync_lock);
|
||||
INIT_LIST_HEAD(&s->s_inodes);
|
||||
spin_lock_init(&s->s_inode_list_lock);
|
||||
INIT_LIST_HEAD(&s->s_inodes_wb);
|
||||
spin_lock_init(&s->s_inode_wblist_lock);
|
||||
|
||||
if (list_lru_init_memcg(&s->s_dentry_lru))
|
||||
goto fail;
|
||||
|
|
|
@ -257,10 +257,9 @@ out:
|
|||
* fatal_signal_pending()s, and the mmap_sem must be released before
|
||||
* returning it.
|
||||
*/
|
||||
int handle_userfault(struct vm_area_struct *vma, unsigned long address,
|
||||
unsigned int flags, unsigned long reason)
|
||||
int handle_userfault(struct fault_env *fe, unsigned long reason)
|
||||
{
|
||||
struct mm_struct *mm = vma->vm_mm;
|
||||
struct mm_struct *mm = fe->vma->vm_mm;
|
||||
struct userfaultfd_ctx *ctx;
|
||||
struct userfaultfd_wait_queue uwq;
|
||||
int ret;
|
||||
|
@ -269,7 +268,7 @@ int handle_userfault(struct vm_area_struct *vma, unsigned long address,
|
|||
BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
|
||||
|
||||
ret = VM_FAULT_SIGBUS;
|
||||
ctx = vma->vm_userfaultfd_ctx.ctx;
|
||||
ctx = fe->vma->vm_userfaultfd_ctx.ctx;
|
||||
if (!ctx)
|
||||
goto out;
|
||||
|
||||
|
@ -302,17 +301,17 @@ int handle_userfault(struct vm_area_struct *vma, unsigned long address,
|
|||
* without first stopping userland access to the memory. For
|
||||
* VM_UFFD_MISSING userfaults this is enough for now.
|
||||
*/
|
||||
if (unlikely(!(flags & FAULT_FLAG_ALLOW_RETRY))) {
|
||||
if (unlikely(!(fe->flags & FAULT_FLAG_ALLOW_RETRY))) {
|
||||
/*
|
||||
* Validate the invariant that nowait must allow retry
|
||||
* to be sure not to return SIGBUS erroneously on
|
||||
* nowait invocations.
|
||||
*/
|
||||
BUG_ON(flags & FAULT_FLAG_RETRY_NOWAIT);
|
||||
BUG_ON(fe->flags & FAULT_FLAG_RETRY_NOWAIT);
|
||||
#ifdef CONFIG_DEBUG_VM
|
||||
if (printk_ratelimit()) {
|
||||
printk(KERN_WARNING
|
||||
"FAULT_FLAG_ALLOW_RETRY missing %x\n", flags);
|
||||
"FAULT_FLAG_ALLOW_RETRY missing %x\n", fe->flags);
|
||||
dump_stack();
|
||||
}
|
||||
#endif
|
||||
|
@ -324,7 +323,7 @@ int handle_userfault(struct vm_area_struct *vma, unsigned long address,
|
|||
* and wait.
|
||||
*/
|
||||
ret = VM_FAULT_RETRY;
|
||||
if (flags & FAULT_FLAG_RETRY_NOWAIT)
|
||||
if (fe->flags & FAULT_FLAG_RETRY_NOWAIT)
|
||||
goto out;
|
||||
|
||||
/* take the reference before dropping the mmap_sem */
|
||||
|
@ -332,10 +331,11 @@ int handle_userfault(struct vm_area_struct *vma, unsigned long address,
|
|||
|
||||
init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function);
|
||||
uwq.wq.private = current;
|
||||
uwq.msg = userfault_msg(address, flags, reason);
|
||||
uwq.msg = userfault_msg(fe->address, fe->flags, reason);
|
||||
uwq.ctx = ctx;
|
||||
|
||||
return_to_userland = (flags & (FAULT_FLAG_USER|FAULT_FLAG_KILLABLE)) ==
|
||||
return_to_userland =
|
||||
(fe->flags & (FAULT_FLAG_USER|FAULT_FLAG_KILLABLE)) ==
|
||||
(FAULT_FLAG_USER|FAULT_FLAG_KILLABLE);
|
||||
|
||||
spin_lock(&ctx->fault_pending_wqh.lock);
|
||||
|
@ -353,7 +353,7 @@ int handle_userfault(struct vm_area_struct *vma, unsigned long address,
|
|||
TASK_KILLABLE);
|
||||
spin_unlock(&ctx->fault_pending_wqh.lock);
|
||||
|
||||
must_wait = userfaultfd_must_wait(ctx, address, flags, reason);
|
||||
must_wait = userfaultfd_must_wait(ctx, fe->address, fe->flags, reason);
|
||||
up_read(&mm->mmap_sem);
|
||||
|
||||
if (likely(must_wait && !ACCESS_ONCE(ctx->released) &&
|
||||
|
|
|
@ -1551,7 +1551,7 @@ xfs_filemap_page_mkwrite(
|
|||
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
|
||||
|
||||
if (IS_DAX(inode)) {
|
||||
ret = __dax_mkwrite(vma, vmf, xfs_get_blocks_dax_fault);
|
||||
ret = dax_mkwrite(vma, vmf, xfs_get_blocks_dax_fault);
|
||||
} else {
|
||||
ret = block_page_mkwrite(vma, vmf, xfs_get_blocks);
|
||||
ret = block_page_mkwrite_return(ret);
|
||||
|
@ -1585,7 +1585,7 @@ xfs_filemap_fault(
|
|||
* changes to xfs_get_blocks_direct() to map unwritten extent
|
||||
* ioend for conversion on read-only mappings.
|
||||
*/
|
||||
ret = __dax_fault(vma, vmf, xfs_get_blocks_dax_fault);
|
||||
ret = dax_fault(vma, vmf, xfs_get_blocks_dax_fault);
|
||||
} else
|
||||
ret = filemap_fault(vma, vmf);
|
||||
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
|
||||
|
@ -1622,7 +1622,7 @@ xfs_filemap_pmd_fault(
|
|||
}
|
||||
|
||||
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
|
||||
ret = __dax_pmd_fault(vma, addr, pmd, flags, xfs_get_blocks_dax_fault);
|
||||
ret = dax_pmd_fault(vma, addr, pmd, flags, xfs_get_blocks_dax_fault);
|
||||
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
|
||||
|
||||
if (flags & FAULT_FLAG_WRITE)
|
||||
|
|
|
@ -107,6 +107,12 @@ struct mmu_gather {
|
|||
struct mmu_gather_batch local;
|
||||
struct page *__pages[MMU_GATHER_BUNDLE];
|
||||
unsigned int batch_count;
|
||||
/*
|
||||
* __tlb_adjust_range will track the new addr here,
|
||||
* that that we can adjust the range after the flush
|
||||
*/
|
||||
unsigned long addr;
|
||||
int page_size;
|
||||
};
|
||||
|
||||
#define HAVE_GENERIC_MMU_GATHER
|
||||
|
@ -115,23 +121,20 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long
|
|||
void tlb_flush_mmu(struct mmu_gather *tlb);
|
||||
void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start,
|
||||
unsigned long end);
|
||||
int __tlb_remove_page(struct mmu_gather *tlb, struct page *page);
|
||||
|
||||
/* tlb_remove_page
|
||||
* Similar to __tlb_remove_page but will call tlb_flush_mmu() itself when
|
||||
* required.
|
||||
*/
|
||||
static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
||||
{
|
||||
if (!__tlb_remove_page(tlb, page))
|
||||
tlb_flush_mmu(tlb);
|
||||
}
|
||||
extern bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page,
|
||||
int page_size);
|
||||
|
||||
static inline void __tlb_adjust_range(struct mmu_gather *tlb,
|
||||
unsigned long address)
|
||||
{
|
||||
tlb->start = min(tlb->start, address);
|
||||
tlb->end = max(tlb->end, address + PAGE_SIZE);
|
||||
/*
|
||||
* Track the last address with which we adjusted the range. This
|
||||
* will be used later to adjust again after a mmu_flush due to
|
||||
* failed __tlb_remove_page
|
||||
*/
|
||||
tlb->addr = address;
|
||||
}
|
||||
|
||||
static inline void __tlb_reset_range(struct mmu_gather *tlb)
|
||||
|
@ -144,6 +147,40 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
|
|||
}
|
||||
}
|
||||
|
||||
static inline void tlb_remove_page_size(struct mmu_gather *tlb,
|
||||
struct page *page, int page_size)
|
||||
{
|
||||
if (__tlb_remove_page_size(tlb, page, page_size)) {
|
||||
tlb_flush_mmu(tlb);
|
||||
tlb->page_size = page_size;
|
||||
__tlb_adjust_range(tlb, tlb->addr);
|
||||
__tlb_remove_page_size(tlb, page, page_size);
|
||||
}
|
||||
}
|
||||
|
||||
static bool __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
||||
{
|
||||
return __tlb_remove_page_size(tlb, page, PAGE_SIZE);
|
||||
}
|
||||
|
||||
/* tlb_remove_page
|
||||
* Similar to __tlb_remove_page but will call tlb_flush_mmu() itself when
|
||||
* required.
|
||||
*/
|
||||
static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
|
||||
{
|
||||
return tlb_remove_page_size(tlb, page, PAGE_SIZE);
|
||||
}
|
||||
|
||||
static inline bool __tlb_remove_pte_page(struct mmu_gather *tlb, struct page *page)
|
||||
{
|
||||
/* active->nr should be zero when we call this */
|
||||
VM_BUG_ON_PAGE(tlb->active->nr, page);
|
||||
tlb->page_size = PAGE_SIZE;
|
||||
__tlb_adjust_range(tlb, tlb->addr);
|
||||
return __tlb_remove_page(tlb, page);
|
||||
}
|
||||
|
||||
/*
|
||||
* In the case of tlb vma handling, we can optimise these away in the
|
||||
* case where we're doing a full MM flush. When we're doing a munmap,
|
||||
|
|
|
@ -48,6 +48,7 @@
|
|||
#include <linux/migrate.h>
|
||||
#include <linux/gfp.h>
|
||||
#include <linux/err.h>
|
||||
#include <linux/fs.h>
|
||||
|
||||
/*
|
||||
* Balloon device information descriptor.
|
||||
|
@ -62,6 +63,7 @@ struct balloon_dev_info {
|
|||
struct list_head pages; /* Pages enqueued & handled to Host */
|
||||
int (*migratepage)(struct balloon_dev_info *, struct page *newpage,
|
||||
struct page *page, enum migrate_mode mode);
|
||||
struct inode *inode;
|
||||
};
|
||||
|
||||
extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info);
|
||||
|
@ -73,44 +75,18 @@ static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
|
|||
spin_lock_init(&balloon->pages_lock);
|
||||
INIT_LIST_HEAD(&balloon->pages);
|
||||
balloon->migratepage = NULL;
|
||||
balloon->inode = NULL;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_BALLOON_COMPACTION
|
||||
extern bool balloon_page_isolate(struct page *page);
|
||||
extern const struct address_space_operations balloon_aops;
|
||||
extern bool balloon_page_isolate(struct page *page,
|
||||
isolate_mode_t mode);
|
||||
extern void balloon_page_putback(struct page *page);
|
||||
extern int balloon_page_migrate(struct page *newpage,
|
||||
extern int balloon_page_migrate(struct address_space *mapping,
|
||||
struct page *newpage,
|
||||
struct page *page, enum migrate_mode mode);
|
||||
|
||||
/*
|
||||
* __is_movable_balloon_page - helper to perform @page PageBalloon tests
|
||||
*/
|
||||
static inline bool __is_movable_balloon_page(struct page *page)
|
||||
{
|
||||
return PageBalloon(page);
|
||||
}
|
||||
|
||||
/*
|
||||
* balloon_page_movable - test PageBalloon to identify balloon pages
|
||||
* and PagePrivate to check that the page is not
|
||||
* isolated and can be moved by compaction/migration.
|
||||
*
|
||||
* As we might return false positives in the case of a balloon page being just
|
||||
* released under us, this need to be re-tested later, under the page lock.
|
||||
*/
|
||||
static inline bool balloon_page_movable(struct page *page)
|
||||
{
|
||||
return PageBalloon(page) && PagePrivate(page);
|
||||
}
|
||||
|
||||
/*
|
||||
* isolated_balloon_page - identify an isolated balloon page on private
|
||||
* compaction/migration page lists.
|
||||
*/
|
||||
static inline bool isolated_balloon_page(struct page *page)
|
||||
{
|
||||
return PageBalloon(page);
|
||||
}
|
||||
|
||||
/*
|
||||
* balloon_page_insert - insert a page into the balloon's page list and make
|
||||
* the page->private assignment accordingly.
|
||||
|
@ -124,7 +100,7 @@ static inline void balloon_page_insert(struct balloon_dev_info *balloon,
|
|||
struct page *page)
|
||||
{
|
||||
__SetPageBalloon(page);
|
||||
SetPagePrivate(page);
|
||||
__SetPageMovable(page, balloon->inode->i_mapping);
|
||||
set_page_private(page, (unsigned long)balloon);
|
||||
list_add(&page->lru, &balloon->pages);
|
||||
}
|
||||
|
@ -140,11 +116,14 @@ static inline void balloon_page_insert(struct balloon_dev_info *balloon,
|
|||
static inline void balloon_page_delete(struct page *page)
|
||||
{
|
||||
__ClearPageBalloon(page);
|
||||
__ClearPageMovable(page);
|
||||
set_page_private(page, 0);
|
||||
if (PagePrivate(page)) {
|
||||
ClearPagePrivate(page);
|
||||
/*
|
||||
* No touch page.lru field once @page has been isolated
|
||||
* because VM is using the field.
|
||||
*/
|
||||
if (!PageIsolated(page))
|
||||
list_del(&page->lru);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
|
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue