2019-05-27 14:55:06 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-or-later
|
2013-07-11 07:05:03 +08:00
|
|
|
/*
|
|
|
|
* zswap.c - zswap driver file
|
|
|
|
*
|
|
|
|
* zswap is a backend for frontswap that takes pages that are in the process
|
|
|
|
* of being swapped out and attempts to compress and store them in a
|
|
|
|
* RAM-based memory pool. This can result in a significant I/O reduction on
|
|
|
|
* the swap device and, in the case where decompressing from RAM is faster
|
|
|
|
* than reading from the swap device, can also improve workload performance.
|
|
|
|
*
|
|
|
|
* Copyright (C) 2012 Seth Jennings <sjenning@linux.vnet.ibm.com>
|
|
|
|
*/
|
|
|
|
|
|
|
|
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
|
|
|
|
|
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/cpu.h>
|
|
|
|
#include <linux/highmem.h>
|
|
|
|
#include <linux/slab.h>
|
|
|
|
#include <linux/spinlock.h>
|
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/atomic.h>
|
|
|
|
#include <linux/frontswap.h>
|
|
|
|
#include <linux/rbtree.h>
|
|
|
|
#include <linux/swap.h>
|
|
|
|
#include <linux/crypto.h>
|
|
|
|
#include <linux/mempool.h>
|
2014-08-07 07:08:40 +08:00
|
|
|
#include <linux/zpool.h>
|
2013-07-11 07:05:03 +08:00
|
|
|
|
|
|
|
#include <linux/mm_types.h>
|
|
|
|
#include <linux/page-flags.h>
|
|
|
|
#include <linux/swapops.h>
|
|
|
|
#include <linux/writeback.h>
|
|
|
|
#include <linux/pagemap.h>
|
|
|
|
|
|
|
|
/*********************************
|
|
|
|
* statistics
|
|
|
|
**********************************/
|
2014-08-07 07:08:40 +08:00
|
|
|
/* Total bytes used by the compressed storage */
|
|
|
|
static u64 zswap_pool_total_size;
|
2013-07-11 07:05:03 +08:00
|
|
|
/* The number of compressed pages currently stored in zswap */
|
|
|
|
static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 08:15:59 +08:00
|
|
|
/* The number of same-value filled pages currently stored in zswap */
|
|
|
|
static atomic_t zswap_same_filled_pages = ATOMIC_INIT(0);
|
2013-07-11 07:05:03 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The statistics below are not protected from concurrent access for
|
|
|
|
* performance reasons so they may not be a 100% accurate. However,
|
|
|
|
* they do provide useful information on roughly how many times a
|
|
|
|
* certain event is occurring.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* Pool limit was hit (see zswap_max_pool_percent) */
|
|
|
|
static u64 zswap_pool_limit_hit;
|
|
|
|
/* Pages written back when pool limit was reached */
|
|
|
|
static u64 zswap_written_back_pages;
|
|
|
|
/* Store failed due to a reclaim failure after pool limit was reached */
|
|
|
|
static u64 zswap_reject_reclaim_fail;
|
|
|
|
/* Compressed page was too big for the allocator to (optimally) store */
|
|
|
|
static u64 zswap_reject_compress_poor;
|
|
|
|
/* Store failed because underlying allocator could not get memory */
|
|
|
|
static u64 zswap_reject_alloc_fail;
|
|
|
|
/* Store failed because the entry metadata could not be allocated (rare) */
|
|
|
|
static u64 zswap_reject_kmemcache_fail;
|
|
|
|
/* Duplicate store was encountered (rare) */
|
|
|
|
static u64 zswap_duplicate_entry;
|
|
|
|
|
|
|
|
/*********************************
|
|
|
|
* tunables
|
|
|
|
**********************************/
|
2015-06-26 06:00:35 +08:00
|
|
|
|
2017-02-28 06:26:50 +08:00
|
|
|
#define ZSWAP_PARAM_UNSET ""
|
|
|
|
|
2015-06-26 06:00:35 +08:00
|
|
|
/* Enable/disable zswap (disabled by default) */
|
|
|
|
static bool zswap_enabled;
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-04 05:13:09 +08:00
|
|
|
static int zswap_enabled_param_set(const char *,
|
|
|
|
const struct kernel_param *);
|
|
|
|
static struct kernel_param_ops zswap_enabled_param_ops = {
|
|
|
|
.set = zswap_enabled_param_set,
|
|
|
|
.get = param_get_bool,
|
|
|
|
};
|
|
|
|
module_param_cb(enabled, &zswap_enabled_param_ops, &zswap_enabled, 0644);
|
2013-07-11 07:05:03 +08:00
|
|
|
|
2015-09-10 06:35:21 +08:00
|
|
|
/* Crypto compressor to use */
|
2013-07-11 07:05:03 +08:00
|
|
|
#define ZSWAP_COMPRESSOR_DEFAULT "lzo"
|
2015-11-07 08:29:15 +08:00
|
|
|
static char *zswap_compressor = ZSWAP_COMPRESSOR_DEFAULT;
|
2015-09-10 06:35:21 +08:00
|
|
|
static int zswap_compressor_param_set(const char *,
|
|
|
|
const struct kernel_param *);
|
|
|
|
static struct kernel_param_ops zswap_compressor_param_ops = {
|
|
|
|
.set = zswap_compressor_param_set,
|
2015-11-07 08:29:15 +08:00
|
|
|
.get = param_get_charp,
|
|
|
|
.free = param_free_charp,
|
2015-09-10 06:35:21 +08:00
|
|
|
};
|
|
|
|
module_param_cb(compressor, &zswap_compressor_param_ops,
|
2015-11-07 08:29:15 +08:00
|
|
|
&zswap_compressor, 0644);
|
2013-07-11 07:05:03 +08:00
|
|
|
|
2015-09-10 06:35:21 +08:00
|
|
|
/* Compressed storage zpool to use */
|
2014-08-07 07:08:40 +08:00
|
|
|
#define ZSWAP_ZPOOL_DEFAULT "zbud"
|
2015-11-07 08:29:15 +08:00
|
|
|
static char *zswap_zpool_type = ZSWAP_ZPOOL_DEFAULT;
|
2015-09-10 06:35:21 +08:00
|
|
|
static int zswap_zpool_param_set(const char *, const struct kernel_param *);
|
|
|
|
static struct kernel_param_ops zswap_zpool_param_ops = {
|
2015-11-07 08:29:15 +08:00
|
|
|
.set = zswap_zpool_param_set,
|
|
|
|
.get = param_get_charp,
|
|
|
|
.free = param_free_charp,
|
2015-09-10 06:35:21 +08:00
|
|
|
};
|
2015-11-07 08:29:15 +08:00
|
|
|
module_param_cb(zpool, &zswap_zpool_param_ops, &zswap_zpool_type, 0644);
|
2014-08-07 07:08:40 +08:00
|
|
|
|
2015-09-10 06:35:21 +08:00
|
|
|
/* The maximum percentage of memory that the compressed pool can occupy */
|
|
|
|
static unsigned int zswap_max_pool_percent = 20;
|
|
|
|
module_param_named(max_pool_percent, zswap_max_pool_percent, uint, 0644);
|
2014-04-08 06:38:27 +08:00
|
|
|
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 08:15:59 +08:00
|
|
|
/* Enable/disable handling same-value filled pages (enabled by default) */
|
|
|
|
static bool zswap_same_filled_pages_enabled = true;
|
|
|
|
module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled,
|
|
|
|
bool, 0644);
|
|
|
|
|
2013-07-11 07:05:03 +08:00
|
|
|
/*********************************
|
2015-09-10 06:35:19 +08:00
|
|
|
* data structures
|
2013-07-11 07:05:03 +08:00
|
|
|
**********************************/
|
|
|
|
|
2015-09-10 06:35:19 +08:00
|
|
|
struct zswap_pool {
|
|
|
|
struct zpool *zpool;
|
|
|
|
struct crypto_comp * __percpu *tfm;
|
|
|
|
struct kref kref;
|
|
|
|
struct list_head list;
|
mm/zswap: use workqueue to destroy pool
Add a work_struct to struct zswap_pool, and change __zswap_pool_empty to
use the workqueue instead of using call_rcu().
When zswap destroys a pool no longer in use, it uses call_rcu() to
perform the destruction/freeing. Since that executes in softirq
context, it must not sleep. However, actually destroying the pool
involves freeing the per-cpu compressors (which requires locking the
cpu_add_remove_lock mutex) and freeing the zpool, for which the
implementation may sleep (e.g. zsmalloc calls kmem_cache_destroy, which
locks the slab_mutex). So if either mutex is currently taken, or any
other part of the compressor or zpool implementation sleeps, it will
result in a BUG().
It's not easy to reproduce this when changing zswap's params normally.
In testing with a loaded system, this does not fail:
$ cd /sys/module/zswap/parameters
$ echo lz4 > compressor ; echo zsmalloc > zpool
nor does this:
$ while true ; do
> echo lzo > compressor ; echo zbud > zpool
> sleep 1
> echo lz4 > compressor ; echo zsmalloc > zpool
> sleep 1
> done
although it's still possible either of those might fail, depending on
whether anything else besides zswap has locked the mutexes.
However, changing a parameter with no delay immediately causes the
schedule while atomic BUG:
$ while true ; do
> echo lzo > compressor ; echo lz4 > compressor
> done
This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
but moved to zswap, to cover compressor and zpool freeing.
Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Reported-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 07:59:54 +08:00
|
|
|
struct work_struct work;
|
2016-11-27 07:13:40 +08:00
|
|
|
struct hlist_node node;
|
2015-09-10 06:35:19 +08:00
|
|
|
char tfm_name[CRYPTO_MAX_ALG_NAME];
|
2013-07-11 07:05:03 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* struct zswap_entry
|
|
|
|
*
|
|
|
|
* This structure contains the metadata for tracking a single compressed
|
|
|
|
* page within zswap.
|
|
|
|
*
|
|
|
|
* rbnode - links the entry into red-black tree for the appropriate swap type
|
2015-09-10 06:35:19 +08:00
|
|
|
* offset - the swap offset for the entry. Index into the red-black tree.
|
2013-07-11 07:05:03 +08:00
|
|
|
* refcount - the number of outstanding reference to the entry. This is needed
|
|
|
|
* to protect against premature freeing of the entry by code
|
2014-04-08 06:38:25 +08:00
|
|
|
* concurrent calls to load, invalidate, and writeback. The lock
|
2013-07-11 07:05:03 +08:00
|
|
|
* for the zswap_tree structure that contains the entry must
|
|
|
|
* be held while changing the refcount. Since the lock must
|
|
|
|
* be held, there is no reason to also make refcount atomic.
|
|
|
|
* length - the length in bytes of the compressed page data. Needed during
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 08:15:59 +08:00
|
|
|
* decompression. For a same value filled page length is 0.
|
2015-09-10 06:35:19 +08:00
|
|
|
* pool - the zswap_pool the entry's data is in
|
|
|
|
* handle - zpool allocation handle that stores the compressed page data
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 08:15:59 +08:00
|
|
|
* value - value of the same-value filled pages which have same content
|
2013-07-11 07:05:03 +08:00
|
|
|
*/
|
|
|
|
struct zswap_entry {
|
|
|
|
struct rb_node rbnode;
|
|
|
|
pgoff_t offset;
|
|
|
|
int refcount;
|
|
|
|
unsigned int length;
|
2015-09-10 06:35:19 +08:00
|
|
|
struct zswap_pool *pool;
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 08:15:59 +08:00
|
|
|
union {
|
|
|
|
unsigned long handle;
|
|
|
|
unsigned long value;
|
|
|
|
};
|
2013-07-11 07:05:03 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
struct zswap_header {
|
|
|
|
swp_entry_t swpentry;
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The tree lock in the zswap_tree struct protects a few things:
|
|
|
|
* - the rbtree
|
|
|
|
* - the refcount field of each entry in the tree
|
|
|
|
*/
|
|
|
|
struct zswap_tree {
|
|
|
|
struct rb_root rbroot;
|
|
|
|
spinlock_t lock;
|
|
|
|
};
|
|
|
|
|
|
|
|
static struct zswap_tree *zswap_trees[MAX_SWAPFILES];
|
|
|
|
|
2015-09-10 06:35:19 +08:00
|
|
|
/* RCU-protected iteration */
|
|
|
|
static LIST_HEAD(zswap_pools);
|
|
|
|
/* protects zswap_pools list modification */
|
|
|
|
static DEFINE_SPINLOCK(zswap_pools_lock);
|
2016-05-06 07:22:23 +08:00
|
|
|
/* pool counter to provide unique names to zpool */
|
|
|
|
static atomic_t zswap_pools_count = ATOMIC_INIT(0);
|
2015-09-10 06:35:19 +08:00
|
|
|
|
2015-09-10 06:35:21 +08:00
|
|
|
/* used by param callback function */
|
|
|
|
static bool zswap_init_started;
|
|
|
|
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-04 05:13:09 +08:00
|
|
|
/* fatal error during init */
|
|
|
|
static bool zswap_init_failed;
|
|
|
|
|
2017-02-28 06:26:47 +08:00
|
|
|
/* init completed, but couldn't create the initial pool */
|
|
|
|
static bool zswap_has_pool;
|
|
|
|
|
2015-09-10 06:35:19 +08:00
|
|
|
/*********************************
|
|
|
|
* helpers and fwd declarations
|
|
|
|
**********************************/
|
|
|
|
|
|
|
|
#define zswap_pool_debug(msg, p) \
|
|
|
|
pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \
|
|
|
|
zpool_get_type((p)->zpool))
|
|
|
|
|
|
|
|
static int zswap_writeback_entry(struct zpool *pool, unsigned long handle);
|
|
|
|
static int zswap_pool_get(struct zswap_pool *pool);
|
|
|
|
static void zswap_pool_put(struct zswap_pool *pool);
|
|
|
|
|
|
|
|
static const struct zpool_ops zswap_zpool_ops = {
|
|
|
|
.evict = zswap_writeback_entry
|
|
|
|
};
|
|
|
|
|
|
|
|
static bool zswap_is_full(void)
|
|
|
|
{
|
2018-12-28 16:34:29 +08:00
|
|
|
return totalram_pages() * zswap_max_pool_percent / 100 <
|
|
|
|
DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE);
|
2015-09-10 06:35:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void zswap_update_total_size(void)
|
|
|
|
{
|
|
|
|
struct zswap_pool *pool;
|
|
|
|
u64 total = 0;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
|
|
|
|
list_for_each_entry_rcu(pool, &zswap_pools, list)
|
|
|
|
total += zpool_get_total_size(pool->zpool);
|
|
|
|
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
|
|
zswap_pool_total_size = total;
|
|
|
|
}
|
|
|
|
|
2013-07-11 07:05:03 +08:00
|
|
|
/*********************************
|
|
|
|
* zswap entry functions
|
|
|
|
**********************************/
|
|
|
|
static struct kmem_cache *zswap_entry_cache;
|
|
|
|
|
2014-12-13 08:57:15 +08:00
|
|
|
static int __init zswap_entry_cache_create(void)
|
2013-07-11 07:05:03 +08:00
|
|
|
{
|
|
|
|
zswap_entry_cache = KMEM_CACHE(zswap_entry, 0);
|
2014-04-08 06:38:28 +08:00
|
|
|
return zswap_entry_cache == NULL;
|
2013-07-11 07:05:03 +08:00
|
|
|
}
|
|
|
|
|
2014-08-09 05:19:35 +08:00
|
|
|
static void __init zswap_entry_cache_destroy(void)
|
2013-07-11 07:05:03 +08:00
|
|
|
{
|
|
|
|
kmem_cache_destroy(zswap_entry_cache);
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct zswap_entry *zswap_entry_cache_alloc(gfp_t gfp)
|
|
|
|
{
|
|
|
|
struct zswap_entry *entry;
|
|
|
|
entry = kmem_cache_alloc(zswap_entry_cache, gfp);
|
|
|
|
if (!entry)
|
|
|
|
return NULL;
|
|
|
|
entry->refcount = 1;
|
2013-11-13 07:08:27 +08:00
|
|
|
RB_CLEAR_NODE(&entry->rbnode);
|
2013-07-11 07:05:03 +08:00
|
|
|
return entry;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void zswap_entry_cache_free(struct zswap_entry *entry)
|
|
|
|
{
|
|
|
|
kmem_cache_free(zswap_entry_cache, entry);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*********************************
|
|
|
|
* rbtree functions
|
|
|
|
**********************************/
|
|
|
|
static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t offset)
|
|
|
|
{
|
|
|
|
struct rb_node *node = root->rb_node;
|
|
|
|
struct zswap_entry *entry;
|
|
|
|
|
|
|
|
while (node) {
|
|
|
|
entry = rb_entry(node, struct zswap_entry, rbnode);
|
|
|
|
if (entry->offset > offset)
|
|
|
|
node = node->rb_left;
|
|
|
|
else if (entry->offset < offset)
|
|
|
|
node = node->rb_right;
|
|
|
|
else
|
|
|
|
return entry;
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* In the case that a entry with the same offset is found, a pointer to
|
|
|
|
* the existing entry is stored in dupentry and the function returns -EEXIST
|
|
|
|
*/
|
|
|
|
static int zswap_rb_insert(struct rb_root *root, struct zswap_entry *entry,
|
|
|
|
struct zswap_entry **dupentry)
|
|
|
|
{
|
|
|
|
struct rb_node **link = &root->rb_node, *parent = NULL;
|
|
|
|
struct zswap_entry *myentry;
|
|
|
|
|
|
|
|
while (*link) {
|
|
|
|
parent = *link;
|
|
|
|
myentry = rb_entry(parent, struct zswap_entry, rbnode);
|
|
|
|
if (myentry->offset > entry->offset)
|
|
|
|
link = &(*link)->rb_left;
|
|
|
|
else if (myentry->offset < entry->offset)
|
|
|
|
link = &(*link)->rb_right;
|
|
|
|
else {
|
|
|
|
*dupentry = myentry;
|
|
|
|
return -EEXIST;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
rb_link_node(&entry->rbnode, parent, link);
|
|
|
|
rb_insert_color(&entry->rbnode, root);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-11-13 07:08:27 +08:00
|
|
|
static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry)
|
|
|
|
{
|
|
|
|
if (!RB_EMPTY_NODE(&entry->rbnode)) {
|
|
|
|
rb_erase(&entry->rbnode, root);
|
|
|
|
RB_CLEAR_NODE(&entry->rbnode);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2014-08-07 07:08:40 +08:00
|
|
|
* Carries out the common pattern of freeing and entry's zpool allocation,
|
2013-11-13 07:08:27 +08:00
|
|
|
* freeing the entry itself, and decrementing the number of stored pages.
|
|
|
|
*/
|
2014-04-08 06:38:27 +08:00
|
|
|
static void zswap_free_entry(struct zswap_entry *entry)
|
2013-11-13 07:08:27 +08:00
|
|
|
{
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 08:15:59 +08:00
|
|
|
if (!entry->length)
|
|
|
|
atomic_dec(&zswap_same_filled_pages);
|
|
|
|
else {
|
|
|
|
zpool_free(entry->pool->zpool, entry->handle);
|
|
|
|
zswap_pool_put(entry->pool);
|
|
|
|
}
|
2013-11-13 07:08:27 +08:00
|
|
|
zswap_entry_cache_free(entry);
|
|
|
|
atomic_dec(&zswap_stored_pages);
|
2015-09-10 06:35:19 +08:00
|
|
|
zswap_update_total_size();
|
2013-11-13 07:08:27 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* caller must hold the tree lock */
|
|
|
|
static void zswap_entry_get(struct zswap_entry *entry)
|
|
|
|
{
|
|
|
|
entry->refcount++;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* caller must hold the tree lock
|
|
|
|
* remove from the tree and free it, if nobody reference the entry
|
|
|
|
*/
|
|
|
|
static void zswap_entry_put(struct zswap_tree *tree,
|
|
|
|
struct zswap_entry *entry)
|
|
|
|
{
|
|
|
|
int refcount = --entry->refcount;
|
|
|
|
|
|
|
|
BUG_ON(refcount < 0);
|
|
|
|
if (refcount == 0) {
|
|
|
|
zswap_rb_erase(&tree->rbroot, entry);
|
2014-04-08 06:38:27 +08:00
|
|
|
zswap_free_entry(entry);
|
2013-11-13 07:08:27 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* caller must hold the tree lock */
|
|
|
|
static struct zswap_entry *zswap_entry_find_get(struct rb_root *root,
|
|
|
|
pgoff_t offset)
|
|
|
|
{
|
2015-11-07 08:29:09 +08:00
|
|
|
struct zswap_entry *entry;
|
2013-11-13 07:08:27 +08:00
|
|
|
|
|
|
|
entry = zswap_rb_search(root, offset);
|
|
|
|
if (entry)
|
|
|
|
zswap_entry_get(entry);
|
|
|
|
|
|
|
|
return entry;
|
|
|
|
}
|
|
|
|
|
2013-07-11 07:05:03 +08:00
|
|
|
/*********************************
|
|
|
|
* per-cpu code
|
|
|
|
**********************************/
|
|
|
|
static DEFINE_PER_CPU(u8 *, zswap_dstmem);
|
|
|
|
|
2016-11-27 07:13:39 +08:00
|
|
|
static int zswap_dstmem_prepare(unsigned int cpu)
|
2013-07-11 07:05:03 +08:00
|
|
|
{
|
|
|
|
u8 *dst;
|
|
|
|
|
2016-11-27 07:13:39 +08:00
|
|
|
dst = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu));
|
2017-07-07 06:40:40 +08:00
|
|
|
if (!dst)
|
2016-11-27 07:13:39 +08:00
|
|
|
return -ENOMEM;
|
2017-07-07 06:40:40 +08:00
|
|
|
|
2016-11-27 07:13:39 +08:00
|
|
|
per_cpu(zswap_dstmem, cpu) = dst;
|
|
|
|
return 0;
|
2013-07-11 07:05:03 +08:00
|
|
|
}
|
|
|
|
|
2016-11-27 07:13:39 +08:00
|
|
|
static int zswap_dstmem_dead(unsigned int cpu)
|
2013-07-11 07:05:03 +08:00
|
|
|
{
|
2016-11-27 07:13:39 +08:00
|
|
|
u8 *dst;
|
2013-07-11 07:05:03 +08:00
|
|
|
|
2016-11-27 07:13:39 +08:00
|
|
|
dst = per_cpu(zswap_dstmem, cpu);
|
|
|
|
kfree(dst);
|
|
|
|
per_cpu(zswap_dstmem, cpu) = NULL;
|
2015-09-10 06:35:19 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-11-27 07:13:40 +08:00
|
|
|
static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
|
2015-09-10 06:35:19 +08:00
|
|
|
{
|
2016-11-27 07:13:40 +08:00
|
|
|
struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
|
2015-09-10 06:35:19 +08:00
|
|
|
struct crypto_comp *tfm;
|
|
|
|
|
2016-11-27 07:13:40 +08:00
|
|
|
if (WARN_ON(*per_cpu_ptr(pool->tfm, cpu)))
|
|
|
|
return 0;
|
2015-09-10 06:35:19 +08:00
|
|
|
|
2016-11-27 07:13:40 +08:00
|
|
|
tfm = crypto_alloc_comp(pool->tfm_name, 0, 0);
|
|
|
|
if (IS_ERR_OR_NULL(tfm)) {
|
|
|
|
pr_err("could not alloc crypto comp %s : %ld\n",
|
|
|
|
pool->tfm_name, PTR_ERR(tfm));
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
*per_cpu_ptr(pool->tfm, cpu) = tfm;
|
2013-07-11 07:05:03 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-11-27 07:13:40 +08:00
|
|
|
static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
|
2015-09-10 06:35:19 +08:00
|
|
|
{
|
2016-11-27 07:13:40 +08:00
|
|
|
struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
|
|
|
|
struct crypto_comp *tfm;
|
2015-09-10 06:35:19 +08:00
|
|
|
|
2016-11-27 07:13:40 +08:00
|
|
|
tfm = *per_cpu_ptr(pool->tfm, cpu);
|
|
|
|
if (!IS_ERR_OR_NULL(tfm))
|
|
|
|
crypto_free_comp(tfm);
|
|
|
|
*per_cpu_ptr(pool->tfm, cpu) = NULL;
|
|
|
|
return 0;
|
2015-09-10 06:35:19 +08:00
|
|
|
}
|
|
|
|
|
2013-07-11 07:05:03 +08:00
|
|
|
/*********************************
|
2015-09-10 06:35:19 +08:00
|
|
|
* pool functions
|
2013-07-11 07:05:03 +08:00
|
|
|
**********************************/
|
2015-09-10 06:35:19 +08:00
|
|
|
|
|
|
|
static struct zswap_pool *__zswap_pool_current(void)
|
2013-07-11 07:05:03 +08:00
|
|
|
{
|
2015-09-10 06:35:19 +08:00
|
|
|
struct zswap_pool *pool;
|
|
|
|
|
|
|
|
pool = list_first_or_null_rcu(&zswap_pools, typeof(*pool), list);
|
2017-02-28 06:26:47 +08:00
|
|
|
WARN_ONCE(!pool && zswap_has_pool,
|
|
|
|
"%s: no page storage pool!\n", __func__);
|
2015-09-10 06:35:19 +08:00
|
|
|
|
|
|
|
return pool;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct zswap_pool *zswap_pool_current(void)
|
|
|
|
{
|
|
|
|
assert_spin_locked(&zswap_pools_lock);
|
|
|
|
|
|
|
|
return __zswap_pool_current();
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct zswap_pool *zswap_pool_current_get(void)
|
|
|
|
{
|
|
|
|
struct zswap_pool *pool;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
|
|
|
|
pool = __zswap_pool_current();
|
2017-02-28 06:26:47 +08:00
|
|
|
if (!zswap_pool_get(pool))
|
2015-09-10 06:35:19 +08:00
|
|
|
pool = NULL;
|
|
|
|
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
|
|
return pool;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct zswap_pool *zswap_pool_last_get(void)
|
|
|
|
{
|
|
|
|
struct zswap_pool *pool, *last = NULL;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
|
|
|
|
list_for_each_entry_rcu(pool, &zswap_pools, list)
|
|
|
|
last = pool;
|
2017-02-28 06:26:47 +08:00
|
|
|
WARN_ONCE(!last && zswap_has_pool,
|
|
|
|
"%s: no page storage pool!\n", __func__);
|
|
|
|
if (!zswap_pool_get(last))
|
2015-09-10 06:35:19 +08:00
|
|
|
last = NULL;
|
|
|
|
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
|
|
return last;
|
|
|
|
}
|
|
|
|
|
2015-12-19 06:22:04 +08:00
|
|
|
/* type and compressor must be null-terminated */
|
2015-09-10 06:35:19 +08:00
|
|
|
static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor)
|
|
|
|
{
|
|
|
|
struct zswap_pool *pool;
|
|
|
|
|
|
|
|
assert_spin_locked(&zswap_pools_lock);
|
|
|
|
|
|
|
|
list_for_each_entry_rcu(pool, &zswap_pools, list) {
|
2015-12-19 06:22:04 +08:00
|
|
|
if (strcmp(pool->tfm_name, compressor))
|
2015-09-10 06:35:19 +08:00
|
|
|
continue;
|
2015-12-19 06:22:04 +08:00
|
|
|
if (strcmp(zpool_get_type(pool->zpool), type))
|
2015-09-10 06:35:19 +08:00
|
|
|
continue;
|
|
|
|
/* if we can't get it, it's about to be destroyed */
|
|
|
|
if (!zswap_pool_get(pool))
|
|
|
|
continue;
|
|
|
|
return pool;
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct zswap_pool *zswap_pool_create(char *type, char *compressor)
|
|
|
|
{
|
|
|
|
struct zswap_pool *pool;
|
2016-05-06 07:22:23 +08:00
|
|
|
char name[38]; /* 'zswap' + 32 char (max) num + \0 */
|
2015-11-07 08:28:21 +08:00
|
|
|
gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM;
|
2016-11-27 07:13:40 +08:00
|
|
|
int ret;
|
2015-09-10 06:35:19 +08:00
|
|
|
|
2017-02-28 06:26:50 +08:00
|
|
|
if (!zswap_has_pool) {
|
|
|
|
/* if either are unset, pool initialization failed, and we
|
|
|
|
* need both params to be set correctly before trying to
|
|
|
|
* create a pool.
|
|
|
|
*/
|
|
|
|
if (!strcmp(type, ZSWAP_PARAM_UNSET))
|
|
|
|
return NULL;
|
|
|
|
if (!strcmp(compressor, ZSWAP_PARAM_UNSET))
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2015-09-10 06:35:19 +08:00
|
|
|
pool = kzalloc(sizeof(*pool), GFP_KERNEL);
|
2017-07-07 06:40:34 +08:00
|
|
|
if (!pool)
|
2015-09-10 06:35:19 +08:00
|
|
|
return NULL;
|
|
|
|
|
2016-05-06 07:22:23 +08:00
|
|
|
/* unique name for each pool specifically required by zsmalloc */
|
|
|
|
snprintf(name, 38, "zswap%x", atomic_inc_return(&zswap_pools_count));
|
|
|
|
|
|
|
|
pool->zpool = zpool_create_pool(type, name, gfp, &zswap_zpool_ops);
|
2015-09-10 06:35:19 +08:00
|
|
|
if (!pool->zpool) {
|
|
|
|
pr_err("%s zpool not available\n", type);
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
pr_debug("using %s zpool\n", zpool_get_type(pool->zpool));
|
|
|
|
|
|
|
|
strlcpy(pool->tfm_name, compressor, sizeof(pool->tfm_name));
|
|
|
|
pool->tfm = alloc_percpu(struct crypto_comp *);
|
|
|
|
if (!pool->tfm) {
|
|
|
|
pr_err("percpu alloc failed\n");
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2016-11-27 07:13:40 +08:00
|
|
|
ret = cpuhp_state_add_instance(CPUHP_MM_ZSWP_POOL_PREPARE,
|
|
|
|
&pool->node);
|
|
|
|
if (ret)
|
2015-09-10 06:35:19 +08:00
|
|
|
goto error;
|
|
|
|
pr_debug("using %s compressor\n", pool->tfm_name);
|
|
|
|
|
|
|
|
/* being the current pool takes 1 ref; this func expects the
|
|
|
|
* caller to always add the new pool as the current pool
|
|
|
|
*/
|
|
|
|
kref_init(&pool->kref);
|
|
|
|
INIT_LIST_HEAD(&pool->list);
|
|
|
|
|
|
|
|
zswap_pool_debug("created", pool);
|
|
|
|
|
|
|
|
return pool;
|
|
|
|
|
|
|
|
error:
|
|
|
|
free_percpu(pool->tfm);
|
|
|
|
if (pool->zpool)
|
|
|
|
zpool_destroy_pool(pool->zpool);
|
|
|
|
kfree(pool);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2015-11-07 08:29:15 +08:00
|
|
|
static __init struct zswap_pool *__zswap_pool_create_fallback(void)
|
2015-09-10 06:35:19 +08:00
|
|
|
{
|
2017-02-28 06:26:50 +08:00
|
|
|
bool has_comp, has_zpool;
|
|
|
|
|
|
|
|
has_comp = crypto_has_comp(zswap_compressor, 0, 0);
|
|
|
|
if (!has_comp && strcmp(zswap_compressor, ZSWAP_COMPRESSOR_DEFAULT)) {
|
2015-09-10 06:35:19 +08:00
|
|
|
pr_err("compressor %s not available, using default %s\n",
|
|
|
|
zswap_compressor, ZSWAP_COMPRESSOR_DEFAULT);
|
2015-11-07 08:29:15 +08:00
|
|
|
param_free_charp(&zswap_compressor);
|
|
|
|
zswap_compressor = ZSWAP_COMPRESSOR_DEFAULT;
|
2017-02-28 06:26:50 +08:00
|
|
|
has_comp = crypto_has_comp(zswap_compressor, 0, 0);
|
2015-09-10 06:35:19 +08:00
|
|
|
}
|
2017-02-28 06:26:50 +08:00
|
|
|
if (!has_comp) {
|
|
|
|
pr_err("default compressor %s not available\n",
|
|
|
|
zswap_compressor);
|
|
|
|
param_free_charp(&zswap_compressor);
|
|
|
|
zswap_compressor = ZSWAP_PARAM_UNSET;
|
|
|
|
}
|
|
|
|
|
|
|
|
has_zpool = zpool_has_pool(zswap_zpool_type);
|
|
|
|
if (!has_zpool && strcmp(zswap_zpool_type, ZSWAP_ZPOOL_DEFAULT)) {
|
2015-09-10 06:35:19 +08:00
|
|
|
pr_err("zpool %s not available, using default %s\n",
|
|
|
|
zswap_zpool_type, ZSWAP_ZPOOL_DEFAULT);
|
2015-11-07 08:29:15 +08:00
|
|
|
param_free_charp(&zswap_zpool_type);
|
|
|
|
zswap_zpool_type = ZSWAP_ZPOOL_DEFAULT;
|
2017-02-28 06:26:50 +08:00
|
|
|
has_zpool = zpool_has_pool(zswap_zpool_type);
|
2015-09-10 06:35:19 +08:00
|
|
|
}
|
2017-02-28 06:26:50 +08:00
|
|
|
if (!has_zpool) {
|
|
|
|
pr_err("default zpool %s not available\n",
|
|
|
|
zswap_zpool_type);
|
|
|
|
param_free_charp(&zswap_zpool_type);
|
|
|
|
zswap_zpool_type = ZSWAP_PARAM_UNSET;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!has_comp || !has_zpool)
|
|
|
|
return NULL;
|
2015-09-10 06:35:19 +08:00
|
|
|
|
|
|
|
return zswap_pool_create(zswap_zpool_type, zswap_compressor);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void zswap_pool_destroy(struct zswap_pool *pool)
|
|
|
|
{
|
|
|
|
zswap_pool_debug("destroying", pool);
|
|
|
|
|
2016-11-27 07:13:40 +08:00
|
|
|
cpuhp_state_remove_instance(CPUHP_MM_ZSWP_POOL_PREPARE, &pool->node);
|
2015-09-10 06:35:19 +08:00
|
|
|
free_percpu(pool->tfm);
|
|
|
|
zpool_destroy_pool(pool->zpool);
|
|
|
|
kfree(pool);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int __must_check zswap_pool_get(struct zswap_pool *pool)
|
|
|
|
{
|
2017-02-28 06:26:47 +08:00
|
|
|
if (!pool)
|
|
|
|
return 0;
|
|
|
|
|
2015-09-10 06:35:19 +08:00
|
|
|
return kref_get_unless_zero(&pool->kref);
|
|
|
|
}
|
|
|
|
|
mm/zswap: use workqueue to destroy pool
Add a work_struct to struct zswap_pool, and change __zswap_pool_empty to
use the workqueue instead of using call_rcu().
When zswap destroys a pool no longer in use, it uses call_rcu() to
perform the destruction/freeing. Since that executes in softirq
context, it must not sleep. However, actually destroying the pool
involves freeing the per-cpu compressors (which requires locking the
cpu_add_remove_lock mutex) and freeing the zpool, for which the
implementation may sleep (e.g. zsmalloc calls kmem_cache_destroy, which
locks the slab_mutex). So if either mutex is currently taken, or any
other part of the compressor or zpool implementation sleeps, it will
result in a BUG().
It's not easy to reproduce this when changing zswap's params normally.
In testing with a loaded system, this does not fail:
$ cd /sys/module/zswap/parameters
$ echo lz4 > compressor ; echo zsmalloc > zpool
nor does this:
$ while true ; do
> echo lzo > compressor ; echo zbud > zpool
> sleep 1
> echo lz4 > compressor ; echo zsmalloc > zpool
> sleep 1
> done
although it's still possible either of those might fail, depending on
whether anything else besides zswap has locked the mutexes.
However, changing a parameter with no delay immediately causes the
schedule while atomic BUG:
$ while true ; do
> echo lzo > compressor ; echo lz4 > compressor
> done
This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
but moved to zswap, to cover compressor and zpool freeing.
Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Reported-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 07:59:54 +08:00
|
|
|
static void __zswap_pool_release(struct work_struct *work)
|
2015-09-10 06:35:19 +08:00
|
|
|
{
|
mm/zswap: use workqueue to destroy pool
Add a work_struct to struct zswap_pool, and change __zswap_pool_empty to
use the workqueue instead of using call_rcu().
When zswap destroys a pool no longer in use, it uses call_rcu() to
perform the destruction/freeing. Since that executes in softirq
context, it must not sleep. However, actually destroying the pool
involves freeing the per-cpu compressors (which requires locking the
cpu_add_remove_lock mutex) and freeing the zpool, for which the
implementation may sleep (e.g. zsmalloc calls kmem_cache_destroy, which
locks the slab_mutex). So if either mutex is currently taken, or any
other part of the compressor or zpool implementation sleeps, it will
result in a BUG().
It's not easy to reproduce this when changing zswap's params normally.
In testing with a loaded system, this does not fail:
$ cd /sys/module/zswap/parameters
$ echo lz4 > compressor ; echo zsmalloc > zpool
nor does this:
$ while true ; do
> echo lzo > compressor ; echo zbud > zpool
> sleep 1
> echo lz4 > compressor ; echo zsmalloc > zpool
> sleep 1
> done
although it's still possible either of those might fail, depending on
whether anything else besides zswap has locked the mutexes.
However, changing a parameter with no delay immediately causes the
schedule while atomic BUG:
$ while true ; do
> echo lzo > compressor ; echo lz4 > compressor
> done
This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
but moved to zswap, to cover compressor and zpool freeing.
Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Reported-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 07:59:54 +08:00
|
|
|
struct zswap_pool *pool = container_of(work, typeof(*pool), work);
|
|
|
|
|
|
|
|
synchronize_rcu();
|
2015-09-10 06:35:19 +08:00
|
|
|
|
|
|
|
/* nobody should have been able to get a kref... */
|
|
|
|
WARN_ON(kref_get_unless_zero(&pool->kref));
|
|
|
|
|
|
|
|
/* pool is now off zswap_pools list and has no references. */
|
|
|
|
zswap_pool_destroy(pool);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __zswap_pool_empty(struct kref *kref)
|
|
|
|
{
|
|
|
|
struct zswap_pool *pool;
|
|
|
|
|
|
|
|
pool = container_of(kref, typeof(*pool), kref);
|
|
|
|
|
|
|
|
spin_lock(&zswap_pools_lock);
|
|
|
|
|
|
|
|
WARN_ON(pool == zswap_pool_current());
|
|
|
|
|
|
|
|
list_del_rcu(&pool->list);
|
mm/zswap: use workqueue to destroy pool
Add a work_struct to struct zswap_pool, and change __zswap_pool_empty to
use the workqueue instead of using call_rcu().
When zswap destroys a pool no longer in use, it uses call_rcu() to
perform the destruction/freeing. Since that executes in softirq
context, it must not sleep. However, actually destroying the pool
involves freeing the per-cpu compressors (which requires locking the
cpu_add_remove_lock mutex) and freeing the zpool, for which the
implementation may sleep (e.g. zsmalloc calls kmem_cache_destroy, which
locks the slab_mutex). So if either mutex is currently taken, or any
other part of the compressor or zpool implementation sleeps, it will
result in a BUG().
It's not easy to reproduce this when changing zswap's params normally.
In testing with a loaded system, this does not fail:
$ cd /sys/module/zswap/parameters
$ echo lz4 > compressor ; echo zsmalloc > zpool
nor does this:
$ while true ; do
> echo lzo > compressor ; echo zbud > zpool
> sleep 1
> echo lz4 > compressor ; echo zsmalloc > zpool
> sleep 1
> done
although it's still possible either of those might fail, depending on
whether anything else besides zswap has locked the mutexes.
However, changing a parameter with no delay immediately causes the
schedule while atomic BUG:
$ while true ; do
> echo lzo > compressor ; echo lz4 > compressor
> done
This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
but moved to zswap, to cover compressor and zpool freeing.
Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Reported-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 07:59:54 +08:00
|
|
|
|
|
|
|
INIT_WORK(&pool->work, __zswap_pool_release);
|
|
|
|
schedule_work(&pool->work);
|
2015-09-10 06:35:19 +08:00
|
|
|
|
|
|
|
spin_unlock(&zswap_pools_lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void zswap_pool_put(struct zswap_pool *pool)
|
|
|
|
{
|
|
|
|
kref_put(&pool->kref, __zswap_pool_empty);
|
2013-07-11 07:05:03 +08:00
|
|
|
}
|
|
|
|
|
2015-09-10 06:35:21 +08:00
|
|
|
/*********************************
|
|
|
|
* param callbacks
|
|
|
|
**********************************/
|
|
|
|
|
2015-11-07 08:29:15 +08:00
|
|
|
/* val must be a null-terminated string */
|
2015-09-10 06:35:21 +08:00
|
|
|
static int __zswap_param_set(const char *val, const struct kernel_param *kp,
|
|
|
|
char *type, char *compressor)
|
|
|
|
{
|
|
|
|
struct zswap_pool *pool, *put_pool = NULL;
|
2015-11-07 08:29:15 +08:00
|
|
|
char *s = strstrip((char *)val);
|
2015-09-10 06:35:21 +08:00
|
|
|
int ret;
|
|
|
|
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-04 05:13:09 +08:00
|
|
|
if (zswap_init_failed) {
|
|
|
|
pr_err("can't set param, initialization failed\n");
|
|
|
|
return -ENODEV;
|
|
|
|
}
|
|
|
|
|
2015-11-07 08:29:15 +08:00
|
|
|
/* no change required */
|
2017-02-28 06:26:47 +08:00
|
|
|
if (!strcmp(s, *(char **)kp->arg) && zswap_has_pool)
|
2015-11-07 08:29:15 +08:00
|
|
|
return 0;
|
2015-09-10 06:35:21 +08:00
|
|
|
|
|
|
|
/* if this is load-time (pre-init) param setting,
|
|
|
|
* don't create a pool; that's done during init.
|
|
|
|
*/
|
|
|
|
if (!zswap_init_started)
|
2015-11-07 08:29:15 +08:00
|
|
|
return param_set_charp(s, kp);
|
2015-09-10 06:35:21 +08:00
|
|
|
|
|
|
|
if (!type) {
|
2015-11-07 08:29:15 +08:00
|
|
|
if (!zpool_has_pool(s)) {
|
|
|
|
pr_err("zpool %s not available\n", s);
|
2015-09-10 06:35:21 +08:00
|
|
|
return -ENOENT;
|
|
|
|
}
|
2015-11-07 08:29:15 +08:00
|
|
|
type = s;
|
2015-09-10 06:35:21 +08:00
|
|
|
} else if (!compressor) {
|
2015-11-07 08:29:15 +08:00
|
|
|
if (!crypto_has_comp(s, 0, 0)) {
|
|
|
|
pr_err("compressor %s not available\n", s);
|
2015-09-10 06:35:21 +08:00
|
|
|
return -ENOENT;
|
|
|
|
}
|
2015-11-07 08:29:15 +08:00
|
|
|
compressor = s;
|
|
|
|
} else {
|
|
|
|
WARN_ON(1);
|
|
|
|
return -EINVAL;
|
2015-09-10 06:35:21 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
spin_lock(&zswap_pools_lock);
|
|
|
|
|
|
|
|
pool = zswap_pool_find_get(type, compressor);
|
|
|
|
if (pool) {
|
|
|
|
zswap_pool_debug("using existing", pool);
|
2017-02-28 06:26:53 +08:00
|
|
|
WARN_ON(pool == zswap_pool_current());
|
2015-09-10 06:35:21 +08:00
|
|
|
list_del_rcu(&pool->list);
|
|
|
|
}
|
|
|
|
|
2017-02-28 06:26:53 +08:00
|
|
|
spin_unlock(&zswap_pools_lock);
|
|
|
|
|
|
|
|
if (!pool)
|
|
|
|
pool = zswap_pool_create(type, compressor);
|
|
|
|
|
2015-09-10 06:35:21 +08:00
|
|
|
if (pool)
|
2015-11-07 08:29:15 +08:00
|
|
|
ret = param_set_charp(s, kp);
|
2015-09-10 06:35:21 +08:00
|
|
|
else
|
|
|
|
ret = -EINVAL;
|
|
|
|
|
2017-02-28 06:26:53 +08:00
|
|
|
spin_lock(&zswap_pools_lock);
|
|
|
|
|
2015-09-10 06:35:21 +08:00
|
|
|
if (!ret) {
|
|
|
|
put_pool = zswap_pool_current();
|
|
|
|
list_add_rcu(&pool->list, &zswap_pools);
|
2017-02-28 06:26:47 +08:00
|
|
|
zswap_has_pool = true;
|
2015-09-10 06:35:21 +08:00
|
|
|
} else if (pool) {
|
|
|
|
/* add the possibly pre-existing pool to the end of the pools
|
|
|
|
* list; if it's new (and empty) then it'll be removed and
|
|
|
|
* destroyed by the put after we drop the lock
|
|
|
|
*/
|
|
|
|
list_add_tail_rcu(&pool->list, &zswap_pools);
|
|
|
|
put_pool = pool;
|
2017-02-28 06:26:53 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
spin_unlock(&zswap_pools_lock);
|
|
|
|
|
|
|
|
if (!zswap_has_pool && !pool) {
|
2017-02-28 06:26:47 +08:00
|
|
|
/* if initial pool creation failed, and this pool creation also
|
|
|
|
* failed, maybe both compressor and zpool params were bad.
|
|
|
|
* Allow changing this param, so pool creation will succeed
|
|
|
|
* when the other param is changed. We already verified this
|
|
|
|
* param is ok in the zpool_has_pool() or crypto_has_comp()
|
|
|
|
* checks above.
|
|
|
|
*/
|
|
|
|
ret = param_set_charp(s, kp);
|
2015-09-10 06:35:21 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* drop the ref from either the old current pool,
|
|
|
|
* or the new pool we failed to add
|
|
|
|
*/
|
|
|
|
if (put_pool)
|
|
|
|
zswap_pool_put(put_pool);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int zswap_compressor_param_set(const char *val,
|
|
|
|
const struct kernel_param *kp)
|
|
|
|
{
|
|
|
|
return __zswap_param_set(val, kp, zswap_zpool_type, NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int zswap_zpool_param_set(const char *val,
|
|
|
|
const struct kernel_param *kp)
|
|
|
|
{
|
|
|
|
return __zswap_param_set(val, kp, NULL, zswap_compressor);
|
|
|
|
}
|
|
|
|
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-04 05:13:09 +08:00
|
|
|
static int zswap_enabled_param_set(const char *val,
|
|
|
|
const struct kernel_param *kp)
|
|
|
|
{
|
|
|
|
if (zswap_init_failed) {
|
|
|
|
pr_err("can't enable, initialization failed\n");
|
|
|
|
return -ENODEV;
|
|
|
|
}
|
2017-02-28 06:26:47 +08:00
|
|
|
if (!zswap_has_pool && zswap_init_started) {
|
|
|
|
pr_err("can't enable, no pool configured\n");
|
|
|
|
return -ENODEV;
|
|
|
|
}
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-04 05:13:09 +08:00
|
|
|
|
|
|
|
return param_set_bool(val, kp);
|
|
|
|
}
|
|
|
|
|
2013-07-11 07:05:03 +08:00
|
|
|
/*********************************
|
|
|
|
* writeback code
|
|
|
|
**********************************/
|
|
|
|
/* return enum for zswap_get_swap_cache_page */
|
|
|
|
enum zswap_get_swap_ret {
|
|
|
|
ZSWAP_SWAPCACHE_NEW,
|
|
|
|
ZSWAP_SWAPCACHE_EXIST,
|
2013-11-13 07:08:26 +08:00
|
|
|
ZSWAP_SWAPCACHE_FAIL,
|
2013-07-11 07:05:03 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* zswap_get_swap_cache_page
|
|
|
|
*
|
|
|
|
* This is an adaption of read_swap_cache_async()
|
|
|
|
*
|
|
|
|
* This function tries to find a page with the given swap entry
|
|
|
|
* in the swapper_space address space (the swap cache). If the page
|
|
|
|
* is found, it is returned in retpage. Otherwise, a page is allocated,
|
|
|
|
* added to the swap cache, and returned in retpage.
|
|
|
|
*
|
|
|
|
* If success, the swap cache page is returned in retpage
|
2013-11-13 07:08:26 +08:00
|
|
|
* Returns ZSWAP_SWAPCACHE_EXIST if page was already in the swap cache
|
|
|
|
* Returns ZSWAP_SWAPCACHE_NEW if the new page needs to be populated,
|
|
|
|
* the new page is added to swapcache and locked
|
|
|
|
* Returns ZSWAP_SWAPCACHE_FAIL on error
|
2013-07-11 07:05:03 +08:00
|
|
|
*/
|
|
|
|
static int zswap_get_swap_cache_page(swp_entry_t entry,
|
|
|
|
struct page **retpage)
|
|
|
|
{
|
2015-09-09 06:05:00 +08:00
|
|
|
bool page_was_allocated;
|
2013-07-11 07:05:03 +08:00
|
|
|
|
2015-09-09 06:05:00 +08:00
|
|
|
*retpage = __read_swap_cache_async(entry, GFP_KERNEL,
|
|
|
|
NULL, 0, &page_was_allocated);
|
|
|
|
if (page_was_allocated)
|
|
|
|
return ZSWAP_SWAPCACHE_NEW;
|
|
|
|
if (!*retpage)
|
2013-11-13 07:08:26 +08:00
|
|
|
return ZSWAP_SWAPCACHE_FAIL;
|
2013-07-11 07:05:03 +08:00
|
|
|
return ZSWAP_SWAPCACHE_EXIST;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Attempts to free an entry by adding a page to the swap cache,
|
|
|
|
* decompressing the entry data into the page, and issuing a
|
|
|
|
* bio write to write the page back to the swap device.
|
|
|
|
*
|
|
|
|
* This can be thought of as a "resumed writeback" of the page
|
|
|
|
* to the swap device. We are basically resuming the same swap
|
|
|
|
* writeback path that was intercepted with the frontswap_store()
|
|
|
|
* in the first place. After the page has been decompressed into
|
|
|
|
* the swap cache, the compressed version stored by zswap can be
|
|
|
|
* freed.
|
|
|
|
*/
|
2014-08-07 07:08:40 +08:00
|
|
|
static int zswap_writeback_entry(struct zpool *pool, unsigned long handle)
|
2013-07-11 07:05:03 +08:00
|
|
|
{
|
|
|
|
struct zswap_header *zhdr;
|
|
|
|
swp_entry_t swpentry;
|
|
|
|
struct zswap_tree *tree;
|
|
|
|
pgoff_t offset;
|
|
|
|
struct zswap_entry *entry;
|
|
|
|
struct page *page;
|
2015-09-10 06:35:19 +08:00
|
|
|
struct crypto_comp *tfm;
|
2013-07-11 07:05:03 +08:00
|
|
|
u8 *src, *dst;
|
|
|
|
unsigned int dlen;
|
2013-11-13 07:08:27 +08:00
|
|
|
int ret;
|
2013-07-11 07:05:03 +08:00
|
|
|
struct writeback_control wbc = {
|
|
|
|
.sync_mode = WB_SYNC_NONE,
|
|
|
|
};
|
|
|
|
|
|
|
|
/* extract swpentry from data */
|
2014-08-07 07:08:40 +08:00
|
|
|
zhdr = zpool_map_handle(pool, handle, ZPOOL_MM_RO);
|
2013-07-11 07:05:03 +08:00
|
|
|
swpentry = zhdr->swpentry; /* here */
|
|
|
|
tree = zswap_trees[swp_type(swpentry)];
|
|
|
|
offset = swp_offset(swpentry);
|
|
|
|
|
|
|
|
/* find and ref zswap entry */
|
|
|
|
spin_lock(&tree->lock);
|
2013-11-13 07:08:27 +08:00
|
|
|
entry = zswap_entry_find_get(&tree->rbroot, offset);
|
2013-07-11 07:05:03 +08:00
|
|
|
if (!entry) {
|
|
|
|
/* entry was invalidated */
|
|
|
|
spin_unlock(&tree->lock);
|
2019-09-24 06:39:43 +08:00
|
|
|
zpool_unmap_handle(pool, handle);
|
2013-07-11 07:05:03 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
BUG_ON(offset != entry->offset);
|
|
|
|
|
|
|
|
/* try to allocate swap cache page */
|
|
|
|
switch (zswap_get_swap_cache_page(swpentry, &page)) {
|
2013-11-13 07:08:26 +08:00
|
|
|
case ZSWAP_SWAPCACHE_FAIL: /* no memory or invalidate happened */
|
2013-07-11 07:05:03 +08:00
|
|
|
ret = -ENOMEM;
|
|
|
|
goto fail;
|
|
|
|
|
2013-11-13 07:08:26 +08:00
|
|
|
case ZSWAP_SWAPCACHE_EXIST:
|
2013-07-11 07:05:03 +08:00
|
|
|
/* page is already in the swap cache, ignore for now */
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
put_page(page);
|
2013-07-11 07:05:03 +08:00
|
|
|
ret = -EEXIST;
|
|
|
|
goto fail;
|
|
|
|
|
|
|
|
case ZSWAP_SWAPCACHE_NEW: /* page is locked */
|
|
|
|
/* decompress */
|
|
|
|
dlen = PAGE_SIZE;
|
2019-09-24 06:39:43 +08:00
|
|
|
src = (u8 *)zhdr + sizeof(struct zswap_header);
|
2013-07-11 07:05:03 +08:00
|
|
|
dst = kmap_atomic(page);
|
2015-09-10 06:35:19 +08:00
|
|
|
tfm = *get_cpu_ptr(entry->pool->tfm);
|
|
|
|
ret = crypto_comp_decompress(tfm, src, entry->length,
|
|
|
|
dst, &dlen);
|
|
|
|
put_cpu_ptr(entry->pool->tfm);
|
2013-07-11 07:05:03 +08:00
|
|
|
kunmap_atomic(dst);
|
|
|
|
BUG_ON(ret);
|
|
|
|
BUG_ON(dlen != PAGE_SIZE);
|
|
|
|
|
|
|
|
/* page is up to date */
|
|
|
|
SetPageUptodate(page);
|
|
|
|
}
|
|
|
|
|
2013-11-13 07:07:52 +08:00
|
|
|
/* move it to the tail of the inactive list after end_writeback */
|
|
|
|
SetPageReclaim(page);
|
|
|
|
|
2013-07-11 07:05:03 +08:00
|
|
|
/* start writeback */
|
|
|
|
__swap_writepage(page, &wbc, end_swap_bio_write);
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
put_page(page);
|
2013-07-11 07:05:03 +08:00
|
|
|
zswap_written_back_pages++;
|
|
|
|
|
|
|
|
spin_lock(&tree->lock);
|
|
|
|
/* drop local reference */
|
2013-11-13 07:08:27 +08:00
|
|
|
zswap_entry_put(tree, entry);
|
2013-07-11 07:05:03 +08:00
|
|
|
|
|
|
|
/*
|
2013-11-13 07:08:27 +08:00
|
|
|
* There are two possible situations for entry here:
|
|
|
|
* (1) refcount is 1(normal case), entry is valid and on the tree
|
|
|
|
* (2) refcount is 0, entry is freed and not on the tree
|
|
|
|
* because invalidate happened during writeback
|
|
|
|
* search the tree and free the entry if find entry
|
|
|
|
*/
|
|
|
|
if (entry == zswap_rb_search(&tree->rbroot, offset))
|
|
|
|
zswap_entry_put(tree, entry);
|
2013-07-11 07:05:03 +08:00
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
|
2013-11-13 07:08:27 +08:00
|
|
|
goto end;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* if we get here due to ZSWAP_SWAPCACHE_EXIST
|
|
|
|
* a load may happening concurrently
|
|
|
|
* it is safe and okay to not free the entry
|
|
|
|
* if we free the entry in the following put
|
|
|
|
* it it either okay to return !0
|
|
|
|
*/
|
2013-07-11 07:05:03 +08:00
|
|
|
fail:
|
|
|
|
spin_lock(&tree->lock);
|
2013-11-13 07:08:27 +08:00
|
|
|
zswap_entry_put(tree, entry);
|
2013-07-11 07:05:03 +08:00
|
|
|
spin_unlock(&tree->lock);
|
2013-11-13 07:08:27 +08:00
|
|
|
|
|
|
|
end:
|
2019-09-24 06:39:43 +08:00
|
|
|
zpool_unmap_handle(pool, handle);
|
2013-07-11 07:05:03 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2015-09-10 06:35:19 +08:00
|
|
|
static int zswap_shrink(void)
|
|
|
|
{
|
|
|
|
struct zswap_pool *pool;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
pool = zswap_pool_last_get();
|
|
|
|
if (!pool)
|
|
|
|
return -ENOENT;
|
|
|
|
|
|
|
|
ret = zpool_shrink(pool->zpool, 1, NULL);
|
|
|
|
|
|
|
|
zswap_pool_put(pool);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 08:15:59 +08:00
|
|
|
static int zswap_is_page_same_filled(void *ptr, unsigned long *value)
|
|
|
|
{
|
|
|
|
unsigned int pos;
|
|
|
|
unsigned long *page;
|
|
|
|
|
|
|
|
page = (unsigned long *)ptr;
|
|
|
|
for (pos = 1; pos < PAGE_SIZE / sizeof(*page); pos++) {
|
|
|
|
if (page[pos] != page[0])
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
*value = page[0];
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void zswap_fill_page(void *ptr, unsigned long value)
|
|
|
|
{
|
|
|
|
unsigned long *page;
|
|
|
|
|
|
|
|
page = (unsigned long *)ptr;
|
|
|
|
memset_l(page, value, PAGE_SIZE / sizeof(unsigned long));
|
|
|
|
}
|
|
|
|
|
2013-07-11 07:05:03 +08:00
|
|
|
/*********************************
|
|
|
|
* frontswap hooks
|
|
|
|
**********************************/
|
|
|
|
/* attempts to compress and store an single page */
|
|
|
|
static int zswap_frontswap_store(unsigned type, pgoff_t offset,
|
|
|
|
struct page *page)
|
|
|
|
{
|
|
|
|
struct zswap_tree *tree = zswap_trees[type];
|
|
|
|
struct zswap_entry *entry, *dupentry;
|
2015-09-10 06:35:19 +08:00
|
|
|
struct crypto_comp *tfm;
|
2013-07-11 07:05:03 +08:00
|
|
|
int ret;
|
2018-02-01 08:19:59 +08:00
|
|
|
unsigned int hlen, dlen = PAGE_SIZE;
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 08:15:59 +08:00
|
|
|
unsigned long handle, value;
|
2013-07-11 07:05:03 +08:00
|
|
|
char *buf;
|
|
|
|
u8 *src, *dst;
|
2018-02-01 08:19:59 +08:00
|
|
|
struct zswap_header zhdr = { .swpentry = swp_entry(type, offset) };
|
zswap: use movable memory if zpool support allocate movable memory
This is the third version that was updated according to the comments from
Sergey Senozhatsky https://lkml.org/lkml/2019/5/29/73 and Shakeel Butt
https://lkml.org/lkml/2019/6/4/973
zswap compresses swap pages into a dynamically allocated RAM-based memory
pool. The memory pool should be zbud, z3fold or zsmalloc. All of them
will allocate unmovable pages. It will increase the number of unmovable
page blocks that will bad for anti-fragment.
zsmalloc support page migration if request movable page:
handle = zs_malloc(zram->mem_pool, comp_len,
GFP_NOIO | __GFP_HIGHMEM |
__GFP_MOVABLE);
And commit "zpool: Add malloc_support_movable to zpool_driver" add
zpool_malloc_support_movable check malloc_support_movable to make sure if
a zpool support allocate movable memory.
This commit let zswap allocate block with gfp
__GFP_HIGHMEM | __GFP_MOVABLE if zpool support allocate movable memory.
Following part is test log in a pc that has 8G memory and 2G swap.
Without this commit:
~# echo lz4 > /sys/module/zswap/parameters/compressor
~# echo zsmalloc > /sys/module/zswap/parameters/zpool
~# echo 1 > /sys/module/zswap/parameters/enabled
~# swapon /swapfile
~# cd /home/teawater/kernel/vm-scalability/
/home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 1024))
/home/teawater/kernel/vm-scalability# ./case-anon-w-seq
2717908992 bytes / 4826062 usecs = 549973 KB/s
2717908992 bytes / 4864201 usecs = 545661 KB/s
2717908992 bytes / 4867015 usecs = 545346 KB/s
2717908992 bytes / 4915485 usecs = 539968 KB/s
397853 usecs to free memory
357820 usecs to free memory
421333 usecs to free memory
420454 usecs to free memory
/home/teawater/kernel/vm-scalability# cat /proc/pagetypeinfo
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 1 1 1 0 2 1 1 0 1 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Unmovable 6 5 8 6 6 5 4 1 1 1 0
Node 0, zone DMA32, type Movable 25 20 20 19 22 15 14 11 11 5 767
Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 4753 5588 5159 4613 3712 2520 1448 594 188 11 0
Node 0, zone Normal, type Movable 16 3 457 2648 2143 1435 860 459 223 224 296
Node 0, zone Normal, type Reclaimable 0 0 44 38 11 2 0 0 0 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
Node 0, zone DMA 1 7 0 0 0 0
Node 0, zone DMA32 4 1652 0 0 0 0
Node 0, zone Normal 931 1485 15 0 0 0
With this commit:
~# echo lz4 > /sys/module/zswap/parameters/compressor
~# echo zsmalloc > /sys/module/zswap/parameters/zpool
~# echo 1 > /sys/module/zswap/parameters/enabled
~# swapon /swapfile
~# cd /home/teawater/kernel/vm-scalability/
/home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 1024))
/home/teawater/kernel/vm-scalability# ./case-anon-w-seq
2717908992 bytes / 4689240 usecs = 566020 KB/s
2717908992 bytes / 4760605 usecs = 557535 KB/s
2717908992 bytes / 4803621 usecs = 552543 KB/s
2717908992 bytes / 5069828 usecs = 523530 KB/s
431546 usecs to free memory
383397 usecs to free memory
456454 usecs to free memory
224487 usecs to free memory
/home/teawater/kernel/vm-scalability# cat /proc/pagetypeinfo
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 1 1 1 0 2 1 1 0 1 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Unmovable 10 8 10 9 10 4 3 2 3 0 0
Node 0, zone DMA32, type Movable 18 12 14 16 16 11 9 5 5 6 775
Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 1
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 2669 1236 452 118 37 14 4 1 2 3 0
Node 0, zone Normal, type Movable 3850 6086 5274 4327 3510 2494 1520 934 438 220 470
Node 0, zone Normal, type Reclaimable 56 93 155 124 47 31 17 7 3 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
Node 0, zone DMA 1 7 0 0 0 0
Node 0, zone DMA32 4 1650 2 0 0 0
Node 0, zone Normal 79 2326 26 0 0 0
You can see that the number of unmovable page blocks is decreased
when the kernel has this commit.
Link: http://lkml.kernel.org/r/20190605100630.13293-2-teawaterz@linux.alibaba.com
Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 06:39:40 +08:00
|
|
|
gfp_t gfp;
|
2013-07-11 07:05:03 +08:00
|
|
|
|
mm, swap, frontswap: fix THP swap if frontswap enabled
It was reported by Sergey Senozhatsky that if THP (Transparent Huge
Page) and frontswap (via zswap) are both enabled, when memory goes low
so that swap is triggered, segfault and memory corruption will occur in
random user space applications as follow,
kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
#0 0x00007fc08889ae0d _int_malloc (libc.so.6)
#1 0x00007fc08889c2f3 malloc (libc.so.6)
#2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
#3 0x0000560e6005e75c n/a (urxvt)
#4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
#5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
#6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
#7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
#8 0x0000560e6005cb55 ev_run (urxvt)
#9 0x0000560e6003b9b9 main (urxvt)
#10 0x00007fc08883af4a __libc_start_main (libc.so.6)
#11 0x0000560e6003f9da _start (urxvt)
After bisection, it was found the first bad commit is bd4c82c22c36 ("mm,
THP, swap: delay splitting THP after swapped out").
The root cause is as follows:
When the pages are written to swap device during swapping out in
swap_writepage(), zswap (fontswap) is tried to compress the pages to
improve performance. But zswap (frontswap) will treat THP as a normal
page, so only the head page is saved. After swapping in, tail pages
will not be restored to their original contents, causing memory
corruption in the applications.
This is fixed by refusing to save page in the frontswap store functions
if the page is a THP. So that the THP will be swapped out to swap
device.
Another choice is to split THP if frontswap is enabled. But it is found
that the frontswap enabling isn't flexible. For example, if
CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if
zswap itself isn't enabled.
Frontswap has multiple backends, to make it easy for one backend to
enable THP support, the THP checking is put in backend frontswap store
functions instead of the general interfaces.
Link: http://lkml.kernel.org/r/20180209084947.22749-1-ying.huang@intel.com
Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org> [put THP checking in backend]
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Shaohua Li <shli@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org> [4.14]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-22 06:45:39 +08:00
|
|
|
/* THP isn't supported */
|
|
|
|
if (PageTransHuge(page)) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto reject;
|
|
|
|
}
|
|
|
|
|
2015-06-26 06:00:35 +08:00
|
|
|
if (!zswap_enabled || !tree) {
|
2013-07-11 07:05:03 +08:00
|
|
|
ret = -ENODEV;
|
|
|
|
goto reject;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* reclaim space if needed */
|
|
|
|
if (zswap_is_full()) {
|
|
|
|
zswap_pool_limit_hit++;
|
2015-09-10 06:35:19 +08:00
|
|
|
if (zswap_shrink()) {
|
2013-07-11 07:05:03 +08:00
|
|
|
zswap_reject_reclaim_fail++;
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto reject;
|
|
|
|
}
|
2018-07-27 07:37:42 +08:00
|
|
|
|
|
|
|
/* A second zswap_is_full() check after
|
|
|
|
* zswap_shrink() to make sure it's now
|
|
|
|
* under the max_pool_percent
|
|
|
|
*/
|
|
|
|
if (zswap_is_full()) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto reject;
|
|
|
|
}
|
2013-07-11 07:05:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* allocate entry */
|
|
|
|
entry = zswap_entry_cache_alloc(GFP_KERNEL);
|
|
|
|
if (!entry) {
|
|
|
|
zswap_reject_kmemcache_fail++;
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto reject;
|
|
|
|
}
|
|
|
|
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 08:15:59 +08:00
|
|
|
if (zswap_same_filled_pages_enabled) {
|
|
|
|
src = kmap_atomic(page);
|
|
|
|
if (zswap_is_page_same_filled(src, &value)) {
|
|
|
|
kunmap_atomic(src);
|
|
|
|
entry->offset = offset;
|
|
|
|
entry->length = 0;
|
|
|
|
entry->value = value;
|
|
|
|
atomic_inc(&zswap_same_filled_pages);
|
|
|
|
goto insert_entry;
|
|
|
|
}
|
|
|
|
kunmap_atomic(src);
|
|
|
|
}
|
|
|
|
|
2015-09-10 06:35:19 +08:00
|
|
|
/* if entry is successfully added, it keeps the reference */
|
|
|
|
entry->pool = zswap_pool_current_get();
|
|
|
|
if (!entry->pool) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto freepage;
|
|
|
|
}
|
|
|
|
|
2013-07-11 07:05:03 +08:00
|
|
|
/* compress */
|
|
|
|
dst = get_cpu_var(zswap_dstmem);
|
2015-09-10 06:35:19 +08:00
|
|
|
tfm = *get_cpu_ptr(entry->pool->tfm);
|
2013-07-11 07:05:03 +08:00
|
|
|
src = kmap_atomic(page);
|
2015-09-10 06:35:19 +08:00
|
|
|
ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, &dlen);
|
2013-07-11 07:05:03 +08:00
|
|
|
kunmap_atomic(src);
|
2015-09-10 06:35:19 +08:00
|
|
|
put_cpu_ptr(entry->pool->tfm);
|
2013-07-11 07:05:03 +08:00
|
|
|
if (ret) {
|
|
|
|
ret = -EINVAL;
|
2015-09-10 06:35:19 +08:00
|
|
|
goto put_dstmem;
|
2013-07-11 07:05:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* store */
|
2018-02-01 08:19:59 +08:00
|
|
|
hlen = zpool_evictable(entry->pool->zpool) ? sizeof(zhdr) : 0;
|
zswap: use movable memory if zpool support allocate movable memory
This is the third version that was updated according to the comments from
Sergey Senozhatsky https://lkml.org/lkml/2019/5/29/73 and Shakeel Butt
https://lkml.org/lkml/2019/6/4/973
zswap compresses swap pages into a dynamically allocated RAM-based memory
pool. The memory pool should be zbud, z3fold or zsmalloc. All of them
will allocate unmovable pages. It will increase the number of unmovable
page blocks that will bad for anti-fragment.
zsmalloc support page migration if request movable page:
handle = zs_malloc(zram->mem_pool, comp_len,
GFP_NOIO | __GFP_HIGHMEM |
__GFP_MOVABLE);
And commit "zpool: Add malloc_support_movable to zpool_driver" add
zpool_malloc_support_movable check malloc_support_movable to make sure if
a zpool support allocate movable memory.
This commit let zswap allocate block with gfp
__GFP_HIGHMEM | __GFP_MOVABLE if zpool support allocate movable memory.
Following part is test log in a pc that has 8G memory and 2G swap.
Without this commit:
~# echo lz4 > /sys/module/zswap/parameters/compressor
~# echo zsmalloc > /sys/module/zswap/parameters/zpool
~# echo 1 > /sys/module/zswap/parameters/enabled
~# swapon /swapfile
~# cd /home/teawater/kernel/vm-scalability/
/home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 1024))
/home/teawater/kernel/vm-scalability# ./case-anon-w-seq
2717908992 bytes / 4826062 usecs = 549973 KB/s
2717908992 bytes / 4864201 usecs = 545661 KB/s
2717908992 bytes / 4867015 usecs = 545346 KB/s
2717908992 bytes / 4915485 usecs = 539968 KB/s
397853 usecs to free memory
357820 usecs to free memory
421333 usecs to free memory
420454 usecs to free memory
/home/teawater/kernel/vm-scalability# cat /proc/pagetypeinfo
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 1 1 1 0 2 1 1 0 1 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Unmovable 6 5 8 6 6 5 4 1 1 1 0
Node 0, zone DMA32, type Movable 25 20 20 19 22 15 14 11 11 5 767
Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 4753 5588 5159 4613 3712 2520 1448 594 188 11 0
Node 0, zone Normal, type Movable 16 3 457 2648 2143 1435 860 459 223 224 296
Node 0, zone Normal, type Reclaimable 0 0 44 38 11 2 0 0 0 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
Node 0, zone DMA 1 7 0 0 0 0
Node 0, zone DMA32 4 1652 0 0 0 0
Node 0, zone Normal 931 1485 15 0 0 0
With this commit:
~# echo lz4 > /sys/module/zswap/parameters/compressor
~# echo zsmalloc > /sys/module/zswap/parameters/zpool
~# echo 1 > /sys/module/zswap/parameters/enabled
~# swapon /swapfile
~# cd /home/teawater/kernel/vm-scalability/
/home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 1024))
/home/teawater/kernel/vm-scalability# ./case-anon-w-seq
2717908992 bytes / 4689240 usecs = 566020 KB/s
2717908992 bytes / 4760605 usecs = 557535 KB/s
2717908992 bytes / 4803621 usecs = 552543 KB/s
2717908992 bytes / 5069828 usecs = 523530 KB/s
431546 usecs to free memory
383397 usecs to free memory
456454 usecs to free memory
224487 usecs to free memory
/home/teawater/kernel/vm-scalability# cat /proc/pagetypeinfo
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 1 1 1 0 2 1 1 0 1 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Unmovable 10 8 10 9 10 4 3 2 3 0 0
Node 0, zone DMA32, type Movable 18 12 14 16 16 11 9 5 5 6 775
Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 1
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 2669 1236 452 118 37 14 4 1 2 3 0
Node 0, zone Normal, type Movable 3850 6086 5274 4327 3510 2494 1520 934 438 220 470
Node 0, zone Normal, type Reclaimable 56 93 155 124 47 31 17 7 3 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
Node 0, zone DMA 1 7 0 0 0 0
Node 0, zone DMA32 4 1650 2 0 0 0
Node 0, zone Normal 79 2326 26 0 0 0
You can see that the number of unmovable page blocks is decreased
when the kernel has this commit.
Link: http://lkml.kernel.org/r/20190605100630.13293-2-teawaterz@linux.alibaba.com
Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 06:39:40 +08:00
|
|
|
gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM;
|
|
|
|
if (zpool_malloc_support_movable(entry->pool->zpool))
|
|
|
|
gfp |= __GFP_HIGHMEM | __GFP_MOVABLE;
|
|
|
|
ret = zpool_malloc(entry->pool->zpool, hlen + dlen, gfp, &handle);
|
2013-07-11 07:05:03 +08:00
|
|
|
if (ret == -ENOSPC) {
|
|
|
|
zswap_reject_compress_poor++;
|
2015-09-10 06:35:19 +08:00
|
|
|
goto put_dstmem;
|
2013-07-11 07:05:03 +08:00
|
|
|
}
|
|
|
|
if (ret) {
|
|
|
|
zswap_reject_alloc_fail++;
|
2015-09-10 06:35:19 +08:00
|
|
|
goto put_dstmem;
|
2013-07-11 07:05:03 +08:00
|
|
|
}
|
2018-02-01 08:19:59 +08:00
|
|
|
buf = zpool_map_handle(entry->pool->zpool, handle, ZPOOL_MM_RW);
|
|
|
|
memcpy(buf, &zhdr, hlen);
|
|
|
|
memcpy(buf + hlen, dst, dlen);
|
2015-09-10 06:35:19 +08:00
|
|
|
zpool_unmap_handle(entry->pool->zpool, handle);
|
2013-07-11 07:05:03 +08:00
|
|
|
put_cpu_var(zswap_dstmem);
|
|
|
|
|
|
|
|
/* populate entry */
|
|
|
|
entry->offset = offset;
|
|
|
|
entry->handle = handle;
|
|
|
|
entry->length = dlen;
|
|
|
|
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 08:15:59 +08:00
|
|
|
insert_entry:
|
2013-07-11 07:05:03 +08:00
|
|
|
/* map */
|
|
|
|
spin_lock(&tree->lock);
|
|
|
|
do {
|
|
|
|
ret = zswap_rb_insert(&tree->rbroot, entry, &dupentry);
|
|
|
|
if (ret == -EEXIST) {
|
|
|
|
zswap_duplicate_entry++;
|
|
|
|
/* remove from rbtree */
|
2013-11-13 07:08:27 +08:00
|
|
|
zswap_rb_erase(&tree->rbroot, dupentry);
|
|
|
|
zswap_entry_put(tree, dupentry);
|
2013-07-11 07:05:03 +08:00
|
|
|
}
|
|
|
|
} while (ret == -EEXIST);
|
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
|
|
|
|
/* update stats */
|
|
|
|
atomic_inc(&zswap_stored_pages);
|
2015-09-10 06:35:19 +08:00
|
|
|
zswap_update_total_size();
|
2013-07-11 07:05:03 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
2015-09-10 06:35:19 +08:00
|
|
|
put_dstmem:
|
2013-07-11 07:05:03 +08:00
|
|
|
put_cpu_var(zswap_dstmem);
|
2015-09-10 06:35:19 +08:00
|
|
|
zswap_pool_put(entry->pool);
|
|
|
|
freepage:
|
2013-07-11 07:05:03 +08:00
|
|
|
zswap_entry_cache_free(entry);
|
|
|
|
reject:
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* returns 0 if the page was successfully decompressed
|
|
|
|
* return -1 on entry not found or error
|
|
|
|
*/
|
|
|
|
static int zswap_frontswap_load(unsigned type, pgoff_t offset,
|
|
|
|
struct page *page)
|
|
|
|
{
|
|
|
|
struct zswap_tree *tree = zswap_trees[type];
|
|
|
|
struct zswap_entry *entry;
|
2015-09-10 06:35:19 +08:00
|
|
|
struct crypto_comp *tfm;
|
2013-07-11 07:05:03 +08:00
|
|
|
u8 *src, *dst;
|
|
|
|
unsigned int dlen;
|
2013-11-13 07:08:27 +08:00
|
|
|
int ret;
|
2013-07-11 07:05:03 +08:00
|
|
|
|
|
|
|
/* find */
|
|
|
|
spin_lock(&tree->lock);
|
2013-11-13 07:08:27 +08:00
|
|
|
entry = zswap_entry_find_get(&tree->rbroot, offset);
|
2013-07-11 07:05:03 +08:00
|
|
|
if (!entry) {
|
|
|
|
/* entry was written back */
|
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 08:15:59 +08:00
|
|
|
if (!entry->length) {
|
|
|
|
dst = kmap_atomic(page);
|
|
|
|
zswap_fill_page(dst, entry->value);
|
|
|
|
kunmap_atomic(dst);
|
|
|
|
goto freeentry;
|
|
|
|
}
|
|
|
|
|
2013-07-11 07:05:03 +08:00
|
|
|
/* decompress */
|
|
|
|
dlen = PAGE_SIZE;
|
2018-02-01 08:19:59 +08:00
|
|
|
src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
|
|
|
|
if (zpool_evictable(entry->pool->zpool))
|
|
|
|
src += sizeof(struct zswap_header);
|
2013-07-11 07:05:03 +08:00
|
|
|
dst = kmap_atomic(page);
|
2015-09-10 06:35:19 +08:00
|
|
|
tfm = *get_cpu_ptr(entry->pool->tfm);
|
|
|
|
ret = crypto_comp_decompress(tfm, src, entry->length, dst, &dlen);
|
|
|
|
put_cpu_ptr(entry->pool->tfm);
|
2013-07-11 07:05:03 +08:00
|
|
|
kunmap_atomic(dst);
|
2015-09-10 06:35:19 +08:00
|
|
|
zpool_unmap_handle(entry->pool->zpool, entry->handle);
|
2013-07-11 07:05:03 +08:00
|
|
|
BUG_ON(ret);
|
|
|
|
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 08:15:59 +08:00
|
|
|
freeentry:
|
2013-07-11 07:05:03 +08:00
|
|
|
spin_lock(&tree->lock);
|
2013-11-13 07:08:27 +08:00
|
|
|
zswap_entry_put(tree, entry);
|
2013-07-11 07:05:03 +08:00
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* frees an entry in zswap */
|
|
|
|
static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset)
|
|
|
|
{
|
|
|
|
struct zswap_tree *tree = zswap_trees[type];
|
|
|
|
struct zswap_entry *entry;
|
|
|
|
|
|
|
|
/* find */
|
|
|
|
spin_lock(&tree->lock);
|
|
|
|
entry = zswap_rb_search(&tree->rbroot, offset);
|
|
|
|
if (!entry) {
|
|
|
|
/* entry was written back */
|
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* remove from rbtree */
|
2013-11-13 07:08:27 +08:00
|
|
|
zswap_rb_erase(&tree->rbroot, entry);
|
2013-07-11 07:05:03 +08:00
|
|
|
|
|
|
|
/* drop the initial reference from entry creation */
|
2013-11-13 07:08:27 +08:00
|
|
|
zswap_entry_put(tree, entry);
|
2013-07-11 07:05:03 +08:00
|
|
|
|
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* frees all zswap entries for the given swap type */
|
|
|
|
static void zswap_frontswap_invalidate_area(unsigned type)
|
|
|
|
{
|
|
|
|
struct zswap_tree *tree = zswap_trees[type];
|
2013-09-12 05:25:33 +08:00
|
|
|
struct zswap_entry *entry, *n;
|
2013-07-11 07:05:03 +08:00
|
|
|
|
|
|
|
if (!tree)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* walk the tree and free everything */
|
|
|
|
spin_lock(&tree->lock);
|
2013-11-13 07:08:27 +08:00
|
|
|
rbtree_postorder_for_each_entry_safe(entry, n, &tree->rbroot, rbnode)
|
2014-04-08 06:38:27 +08:00
|
|
|
zswap_free_entry(entry);
|
2013-07-11 07:05:03 +08:00
|
|
|
tree->rbroot = RB_ROOT;
|
|
|
|
spin_unlock(&tree->lock);
|
2013-10-17 04:46:54 +08:00
|
|
|
kfree(tree);
|
|
|
|
zswap_trees[type] = NULL;
|
2013-07-11 07:05:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void zswap_frontswap_init(unsigned type)
|
|
|
|
{
|
|
|
|
struct zswap_tree *tree;
|
|
|
|
|
2017-07-07 06:40:37 +08:00
|
|
|
tree = kzalloc(sizeof(*tree), GFP_KERNEL);
|
2014-04-08 06:38:27 +08:00
|
|
|
if (!tree) {
|
|
|
|
pr_err("alloc failed, zswap disabled for swap type %d\n", type);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2013-07-11 07:05:03 +08:00
|
|
|
tree->rbroot = RB_ROOT;
|
|
|
|
spin_lock_init(&tree->lock);
|
|
|
|
zswap_trees[type] = tree;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct frontswap_ops zswap_frontswap_ops = {
|
|
|
|
.store = zswap_frontswap_store,
|
|
|
|
.load = zswap_frontswap_load,
|
|
|
|
.invalidate_page = zswap_frontswap_invalidate_page,
|
|
|
|
.invalidate_area = zswap_frontswap_invalidate_area,
|
|
|
|
.init = zswap_frontswap_init
|
|
|
|
};
|
|
|
|
|
|
|
|
/*********************************
|
|
|
|
* debugfs functions
|
|
|
|
**********************************/
|
|
|
|
#ifdef CONFIG_DEBUG_FS
|
|
|
|
#include <linux/debugfs.h>
|
|
|
|
|
|
|
|
static struct dentry *zswap_debugfs_root;
|
|
|
|
|
|
|
|
static int __init zswap_debugfs_init(void)
|
|
|
|
{
|
|
|
|
if (!debugfs_initialized())
|
|
|
|
return -ENODEV;
|
|
|
|
|
|
|
|
zswap_debugfs_root = debugfs_create_dir("zswap", NULL);
|
|
|
|
|
2018-06-15 06:27:58 +08:00
|
|
|
debugfs_create_u64("pool_limit_hit", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_pool_limit_hit);
|
|
|
|
debugfs_create_u64("reject_reclaim_fail", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_reject_reclaim_fail);
|
|
|
|
debugfs_create_u64("reject_alloc_fail", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_reject_alloc_fail);
|
|
|
|
debugfs_create_u64("reject_kmemcache_fail", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_reject_kmemcache_fail);
|
|
|
|
debugfs_create_u64("reject_compress_poor", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_reject_compress_poor);
|
|
|
|
debugfs_create_u64("written_back_pages", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_written_back_pages);
|
|
|
|
debugfs_create_u64("duplicate_entry", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_duplicate_entry);
|
|
|
|
debugfs_create_u64("pool_total_size", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_pool_total_size);
|
|
|
|
debugfs_create_atomic_t("stored_pages", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_stored_pages);
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 08:15:59 +08:00
|
|
|
debugfs_create_atomic_t("same_filled_pages", 0444,
|
2018-06-15 06:27:58 +08:00
|
|
|
zswap_debugfs_root, &zswap_same_filled_pages);
|
2013-07-11 07:05:03 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __exit zswap_debugfs_exit(void)
|
|
|
|
{
|
|
|
|
debugfs_remove_recursive(zswap_debugfs_root);
|
|
|
|
}
|
|
|
|
#else
|
|
|
|
static int __init zswap_debugfs_init(void)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __exit zswap_debugfs_exit(void) { }
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/*********************************
|
|
|
|
* module init and exit
|
|
|
|
**********************************/
|
|
|
|
static int __init init_zswap(void)
|
|
|
|
{
|
2015-09-10 06:35:19 +08:00
|
|
|
struct zswap_pool *pool;
|
2016-11-27 07:13:39 +08:00
|
|
|
int ret;
|
2014-04-08 06:38:27 +08:00
|
|
|
|
2015-09-10 06:35:21 +08:00
|
|
|
zswap_init_started = true;
|
|
|
|
|
2013-07-11 07:05:03 +08:00
|
|
|
if (zswap_entry_cache_create()) {
|
|
|
|
pr_err("entry cache creation failed\n");
|
2015-09-10 06:35:19 +08:00
|
|
|
goto cache_fail;
|
2013-07-11 07:05:03 +08:00
|
|
|
}
|
2015-09-10 06:35:19 +08:00
|
|
|
|
2016-11-27 07:13:39 +08:00
|
|
|
ret = cpuhp_setup_state(CPUHP_MM_ZSWP_MEM_PREPARE, "mm/zswap:prepare",
|
|
|
|
zswap_dstmem_prepare, zswap_dstmem_dead);
|
|
|
|
if (ret) {
|
2015-09-10 06:35:19 +08:00
|
|
|
pr_err("dstmem alloc failed\n");
|
|
|
|
goto dstmem_fail;
|
2013-07-11 07:05:03 +08:00
|
|
|
}
|
2015-09-10 06:35:19 +08:00
|
|
|
|
2016-11-27 07:13:40 +08:00
|
|
|
ret = cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE,
|
|
|
|
"mm/zswap_pool:prepare",
|
|
|
|
zswap_cpu_comp_prepare,
|
|
|
|
zswap_cpu_comp_dead);
|
|
|
|
if (ret)
|
|
|
|
goto hp_fail;
|
|
|
|
|
2015-09-10 06:35:19 +08:00
|
|
|
pool = __zswap_pool_create_fallback();
|
2017-02-28 06:26:47 +08:00
|
|
|
if (pool) {
|
|
|
|
pr_info("loaded using pool %s/%s\n", pool->tfm_name,
|
|
|
|
zpool_get_type(pool->zpool));
|
|
|
|
list_add(&pool->list, &zswap_pools);
|
|
|
|
zswap_has_pool = true;
|
|
|
|
} else {
|
2015-09-10 06:35:19 +08:00
|
|
|
pr_err("pool creation failed\n");
|
2017-02-28 06:26:47 +08:00
|
|
|
zswap_enabled = false;
|
2013-07-11 07:05:03 +08:00
|
|
|
}
|
2014-04-08 06:38:27 +08:00
|
|
|
|
2013-07-11 07:05:03 +08:00
|
|
|
frontswap_register_ops(&zswap_frontswap_ops);
|
|
|
|
if (zswap_debugfs_init())
|
|
|
|
pr_warn("debugfs initialization failed\n");
|
|
|
|
return 0;
|
2015-09-10 06:35:19 +08:00
|
|
|
|
2016-11-27 07:13:40 +08:00
|
|
|
hp_fail:
|
2016-11-27 07:13:39 +08:00
|
|
|
cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE);
|
2015-09-10 06:35:19 +08:00
|
|
|
dstmem_fail:
|
2014-08-09 05:19:35 +08:00
|
|
|
zswap_entry_cache_destroy();
|
2015-09-10 06:35:19 +08:00
|
|
|
cache_fail:
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-04 05:13:09 +08:00
|
|
|
/* if built-in, we aren't unloaded on failure; don't allow use */
|
|
|
|
zswap_init_failed = true;
|
|
|
|
zswap_enabled = false;
|
2013-07-11 07:05:03 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
/* must be late so crypto has time to come up */
|
|
|
|
late_initcall(init_zswap);
|
|
|
|
|
|
|
|
MODULE_LICENSE("GPL");
|
2014-11-13 11:08:46 +08:00
|
|
|
MODULE_AUTHOR("Seth Jennings <sjennings@variantweb.net>");
|
2013-07-11 07:05:03 +08:00
|
|
|
MODULE_DESCRIPTION("Compressed cache for swap pages");
|