Commit Graph

83 Commits

Author SHA1 Message Date
Greg Kroah-Hartman 61d3d5108e mm: remove PageMovable export
The only in-kernel users that need PageMovable() to be exported are z3fold
and zsmalloc and they are only using it for dubious debugging
functionality.  So remove those usages and the export so that no driver
code accidentally thinks that they are allowed to use this symbol.

Link: https://lkml.kernel.org/r/20230106135900.3763622-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18 17:12:57 -08:00
Johannes Weiner 6a05aa3010 zpool: clean out dead code
There is a lot of provision for flexibility that isn't actually needed or
used.  Zswap (the only zpool user) always passes zpool_ops with an .evict
method set.  The backends who reclaim only do so for zswap, so they can
also directly call zpool_ops without indirection or checks.

Finally, there is no need to check the retries parameters and bail with
-EINVAL in the reclaim function, when that's called just a few lines below
with a hard-coded 8.  There is no need to duplicate the evictable and
sleep_mapped attrs from the driver in zpool_ops.

Link: https://lkml.kernel.org/r/20221128191616.1261026-3-nphamcs@gmail.com
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11 18:12:10 -08:00
Matthew Wilcox (Oracle) 68f2736a85 mm: Convert all PageMovable users to movable_operations
These drivers are rather uncomfortably hammered into the
address_space_operations hole.  They aren't filesystems and don't behave
like filesystems.  They just need their own movable_operations structure,
which we can point to directly from page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-08-02 12:34:03 -04:00
Miaohe Lin 943fb61dd6 mm/z3fold: fix z3fold_page_migrate races with z3fold_map
Think about the below scenario:

CPU1				CPU2
 z3fold_page_migrate		z3fold_map
  z3fold_page_trylock
  ...
  z3fold_page_unlock
  /* slots still points to old zhdr*/
				 get_z3fold_header
				  get slots from handle
				  get old zhdr from slots
				  z3fold_page_trylock
				  return *old* zhdr
  encode_handle(new_zhdr, FIRST|LAST|MIDDLE)
  put_page(page) /* zhdr is freed! */
				 but zhdr is still used by caller!

z3fold_map can map freed z3fold page and lead to use-after-free bug.  To
fix it, we add PAGE_MIGRATED to indicate z3fold page is migrated and soon
to be released.  So get_z3fold_header won't return such page.

Link: https://lkml.kernel.org/r/20220429064051.61552-10-linmiaohe@huawei.com
Fixes: 1f862989b0 ("mm/z3fold.c: support page migration")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:44 -07:00
Miaohe Lin 04094226d6 mm/z3fold: fix z3fold_reclaim_page races with z3fold_free
Think about the below scenario:

CPU1				CPU2
z3fold_reclaim_page		z3fold_free
 spin_lock(&pool->lock)		 get_z3fold_header -- hold page_lock
 kref_get_unless_zero
				 kref_put--zhdr->refcount can be 1 now
 !z3fold_page_trylock
  kref_put -- zhdr->refcount is 0 now
   release_z3fold_page
    WARN_ON(!list_empty(&zhdr->buddy)); -- we're on buddy now!
    spin_lock(&pool->lock); -- deadlock here!

z3fold_reclaim_page might race with z3fold_free and will lead to pool lock
deadlock and zhdr buddy non-empty warning.  To fix this, defer getting the
refcount until page_lock is held just like what __z3fold_alloc does.  Note
this has the side effect that we won't break the reclaim if we meet a soon
to be released z3fold page now.

Link: https://lkml.kernel.org/r/20220429064051.61552-9-linmiaohe@huawei.com
Fixes: dcf5aedb24 ("z3fold: stricter locking and more careful reclaim")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:44 -07:00
Miaohe Lin 4a1c383910 mm/z3fold: always clear PAGE_CLAIMED under z3fold page lock
Think about the below race window:

CPU1				CPU2
z3fold_reclaim_page		z3fold_free
 test_and_set_bit PAGE_CLAIMED
 failed to reclaim page
 z3fold_page_lock(zhdr);
 add back to the lru list;
 z3fold_page_unlock(zhdr);
				 get_z3fold_header
				 page_claimed=test_and_set_bit PAGE_CLAIMED

 clear_bit(PAGE_CLAIMED, &page->private);

				 if (!page_claimed) /* it's false true */
				  free_handle is not called

free_handle won't be called in this case. So z3fold_buddy_slots will leak.
Fix it by always clear PAGE_CLAIMED under z3fold page lock.

Link: https://lkml.kernel.org/r/20220429064051.61552-8-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:44 -07:00
Miaohe Lin 6cf9a34967 mm/z3fold: put z3fold page back into unbuddied list when reclaim or migration fails
When doing z3fold page reclaim or migration, the page is removed from
unbuddied list.  If reclaim or migration succeeds, it's fine as page is
released.  But in case it fails, the page is not put back into unbuddied
list now.  The page will be leaked until next compaction work, reclaim or
migration is done.

Link: https://lkml.kernel.org/r/20220429064051.61552-7-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:44 -07:00
Miaohe Lin f4bad643c1 revert "mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc"
Revert commit f1549cb5ab ("mm/z3fold.c: allow __GFP_HIGHMEM in
z3fold_alloc").

z3fold can't support GFP_HIGHMEM page now.  page_address is used directly
at all places.  Moreover, z3fold_header is on per cpu unbuddied list which
could be accessed anytime.  So we should remove the support of GFP_HIGHMEM
allocation for z3fold.

Link: https://lkml.kernel.org/r/20220429064051.61552-6-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:43 -07:00
Miaohe Lin 2c0f351434 mm/z3fold: throw warning on failure of trylock_page in z3fold_alloc
If trylock_page fails, the page won't be non-lru movable page.  When this
page is freed via free_z3fold_page, it will trigger bug on PageMovable
check in __ClearPageMovable.  Throw warning on failure of trylock_page to
guard against such rare case just as what zsmalloc does.

Link: https://lkml.kernel.org/r/20220429064051.61552-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:43 -07:00
Miaohe Lin df6f0f1d0c mm/z3fold: remove buggy use of stale list for allocation
Currently if z3fold couldn't find an unbuddied page it would first try to
pull a page off the stale list.  But this approach is problematic.  If
init z3fold page fails later, the page should be freed via
free_z3fold_page to clean up the relevant resource instead of using
__free_page directly.  And if page is successfully reused, it will BUG_ON
later in __SetPageMovable because it's already non-lru movable page, i.e. 
PAGE_MAPPING_MOVABLE is already set in page->mapping.  In order to fix all
of these issues, we can simply remove the buggy use of stale list for
allocation because can_sleep should always be false and we never really
hit the reusing code path now.

Link: https://lkml.kernel.org/r/20220429064051.61552-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:43 -07:00
Miaohe Lin 7c61c35bbd mm/z3fold: fix possible null pointer dereferencing
alloc_slots could fail to allocate memory under heavy memory pressure.  So
we should check zhdr->slots against NULL to avoid future null pointer
dereferencing.

Link: https://lkml.kernel.org/r/20220429064051.61552-3-linmiaohe@huawei.com
Fixes: fc5488651c ("z3fold: simplify freeing slots")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:43 -07:00
Miaohe Lin 4c6bdb3640 mm/z3fold: fix sheduling while atomic
Patch series "A few fixup patches for z3fold".

This series contains a few fixup patches to fix sheduling while atomic,
fix possible null pointer dereferencing, fix various race conditions and
so on. More details can be found in the respective changelogs.


This patch (of 9):

z3fold's page_lock is always held when calling alloc_slots.  So gfp should
be GFP_ATOMIC to avoid "scheduling while atomic" bug.

Link: https://lkml.kernel.org/r/20220429064051.61552-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220429064051.61552-2-linmiaohe@huawei.com
Fixes: fc5488651c ("z3fold: simplify freeing slots")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:43 -07:00
Miaohe Lin daf79bd8ee mm/z3fold: remove unneeded PAGE_HEADLESS check in free_handle()
The only caller z3fold_free() never calls free_handle() in PAGE_HEADLESS
case.  Remove this unneeded check.

Link: https://lkml.kernel.org/r/20220308134311.59086-9-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:06 -07:00
Miaohe Lin 52fb90cc19 mm/z3fold: remove redundant list_del_init of zhdr->buddy in z3fold_free
do_compact_page() will do list_del_init(&zhdr->buddy) for us.  Remove this
extra one to save some possible cpu cycles.

Link: https://lkml.kernel.org/r/20220308134311.59086-8-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:06 -07:00
Miaohe Lin 5e36c25b2c mm/z3fold: move decrement of pool->pages_nr into __release_z3fold_page()
The z3fold will always do atomic64_dec(&pool->pages_nr) when the
__release_z3fold_page() is called.  Thus we can move decrement of
pool->pages_nr into __release_z3fold_page() to simplify the code.

Also we can reduce the size of z3fold.o ~1k.

Without this patch:
   text	   data	    bss	    dec	    hex	filename
  15444	   1376	      8	  16828	   41bc	mm/z3fold.o
With this patch:
   text	   data	    bss	    dec	    hex	filename
  15044	   1248	      8	  16300	   3fac	mm/z3fold.o

Link: https://lkml.kernel.org/r/20220308134311.59086-7-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:05 -07:00
Miaohe Lin a3148b5fea mm/z3fold: remove confusing local variable l reassignment
The local variable l holds the address of unbuddied[i] which won't change
after we take the pool lock.  Remove it to avoid confusion.

Link: https://lkml.kernel.org/r/20220308134311.59086-6-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:05 -07:00
Miaohe Lin 8ea2f86cea mm/z3fold: remove unneeded page_mapcount_reset and ClearPagePrivate
Page->page_type and PagePrivate are not used in z3fold.  We should remove
these confusing unneeded operations.  The z3fold do these here is due to
referring to zsmalloc's migration code which does need these operations.

Link: https://lkml.kernel.org/r/20220308134311.59086-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:05 -07:00
Miaohe Lin ed0e5dcab3 mm/z3fold: minor clean up for z3fold_free
Use put_z3fold_header() to pair with get_z3fold_header.  Also fix the
wrong comments.  Minor readability improvement.

Link: https://lkml.kernel.org/r/20220308134311.59086-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:05 -07:00
Miaohe Lin 78da57d401 mm/z3fold: remove obsolete comment in z3fold_alloc
The highmem pages are supported since commit f1549cb5ab ("mm/z3fold.c:
allow __GFP_HIGHMEM in z3fold_alloc").  Remove the residual comment.

Link: https://lkml.kernel.org/r/20220308134311.59086-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:05 -07:00
Miaohe Lin dc3a1f3024 mm/z3fold: declare z3fold_mount with __init
Patch series "A few cleanup patches for z3fold", v2.

This series contains a few patches to simplify the code, remove unneeded
code, fix obsolete comment and so on.  More details can be found in the
respective changelogs.


This patch (of 8):

z3fold_mount is only called during init.  So we should declare it with
__init.

Link: https://lkml.kernel.org/r/20220308134311.59086-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220308134311.59086-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:05 -07:00
Mel Gorman 30522175d2 mm/z3fold: add kerneldoc fields for z3fold_pool
make W=1 generates the following warning for z3fold_pool

  mm/z3fold.c:171: warning: Function parameter or member 'zpool' not described in 'z3fold_pool'
  mm/z3fold.c:171: warning: Function parameter or member 'zpool_ops' not described in 'z3fold_pool'

Commit 9a001fc19c ("z3fold: the 3-fold allocator for compressed pages")
simply did not document the fields at the time.  Add rudimentary
documentation.

Link: https://lkml.kernel.org/r/20210520084809.8576-11-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01 11:06:03 -07:00
Miaohe Lin 28473d91ff mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page
We should use release_z3fold_page_locked() to release z3fold page when
it's locked, although it looks harmless to use release_z3fold_page() now.

Link: https://lkml.kernel.org/r/20210619093151.1492174-7-linmiaohe@huawei.com
Fixes: dcf5aedb24 ("z3fold: stricter locking and more careful reclaim")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30 20:47:29 -07:00
Miaohe Lin dac0d1cfda mm/z3fold: fix potential memory leak in z3fold_destroy_pool()
There is a memory leak in z3fold_destroy_pool() as it forgets to
free_percpu pool->unbuddied.  Call free_percpu for pool->unbuddied to fix
this issue.

Link: https://lkml.kernel.org/r/20210619093151.1492174-6-linmiaohe@huawei.com
Fixes: d30561c56f ("z3fold: use per-cpu unbuddied lists")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30 20:47:28 -07:00
Miaohe Lin 767cc6c556 mm/z3fold: remove unused function handle_to_z3fold_header()
handle_to_z3fold_header() is unused now.  So we can remove it.  As a
result, get_z3fold_header() becomes the only caller of
__get_z3fold_header() and the argument lock is always true.  Therefore we
could further fold the __get_z3fold_header() into get_z3fold_header() with
lock = true.

Link: https://lkml.kernel.org/r/20210619093151.1492174-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30 20:47:28 -07:00
Miaohe Lin e891f60e28 mm/z3fold: remove magic number in z3fold_create_pool()
It's meaningless to pass a magic number 2 to __alloc_percpu() as there is
a minimum alignment size of PCPU_MIN_ALLOC_SIZE (> 2) in it.  Also there
is no special alignment requirement for unbuddied.  So we could replace
this magic number with nature alignment, i.e.  __alignof__(struct
list_head), to improve readability.

Link: https://lkml.kernel.org/r/20210619093151.1492174-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30 20:47:28 -07:00
Miaohe Lin 014284a081 mm/z3fold: avoid possible underflow in z3fold_alloc()
It is not enough to just make sure the z3fold header is not larger than
the page size.  When z3fold header is equal to PAGE_SIZE, we would
underflow when check alloc size against PAGE_SIZE - ZHDR_SIZE_ALIGNED -
CHUNK_SIZE in z3fold_alloc().  Make sure there has remaining spaces for
its buddy to fix this theoretical issue.

Link: https://lkml.kernel.org/r/20210619093151.1492174-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30 20:47:28 -07:00
Miaohe Lin e3c0db4fec mm/z3fold: define macro NCHUNKS as TOTAL_CHUNKS - ZHDR_CHUNKS
Patch series "Cleanup and fixup for z3fold".

This series contains cleanups to remove unused function, redefine macro to
improve readability and so on.  Also this fixes several bugs in z3fold,
such as memory leak in z3fold_destroy_pool().  More details can be found
in the respective changelogs.

This patch (of 6):

To improve code readability, we could define macro NCHUNKS as TOTAL_CHUNKS
- ZHDR_CHUNKS.  No functional change intended.

Link: https://lkml.kernel.org/r/20210619093151.1492174-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20210619093151.1492174-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30 20:47:28 -07:00
Shijie Luo cb152a1a95 mm: fix some typos and code style problems
fix some typos and code style problems in mm.

gfp.h: s/MAXNODES/MAX_NUMNODES
mmzone.h: s/then/than
rmap.c: s/__vma_split()/__vma_adjust()
swap.c: s/__mod_zone_page_stat/__mod_zone_page_state, s/is is/is
swap_state.c: s/whoes/whose
z3fold.c: code style problem fix in z3fold_unregister_migration
zsmalloc.c: s/of/or, s/give/given

Link: https://lkml.kernel.org/r/20210419083057.64820-1-luoshijie1@huawei.com
Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07 00:26:33 -07:00
Thomas Hebb 6d679578fe z3fold: prevent reclaim/free race for headless pages
Commit ca0246bb97 ("z3fold: fix possible reclaim races") introduced
the PAGE_CLAIMED flag "to avoid racing on a z3fold 'headless' page
release." By atomically testing and setting the bit in each of
z3fold_free() and z3fold_reclaim_page(), a double-free was avoided.

However, commit dcf5aedb24 ("z3fold: stricter locking and more careful
reclaim") appears to have unintentionally broken this behavior by moving
the PAGE_CLAIMED check in z3fold_reclaim_page() to after the page lock
gets taken, which only happens for non-headless pages.  For headless
pages, the check is now skipped entirely and races can occur again.

I have observed such a race on my system:

    page:00000000ffbd76b7 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x165316
    flags: 0x2ffff0000000000()
    raw: 02ffff0000000000 ffffea0004535f48 ffff8881d553a170 0000000000000000
    raw: 0000000000000000 0000000000000011 00000000ffffffff 0000000000000000
    page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
    ------------[ cut here ]------------
    kernel BUG at include/linux/mm.h:707!
    invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
    CPU: 2 PID: 291928 Comm: kworker/2:0 Tainted: G    B             5.10.7-arch1-1-kasan #1
    Hardware name: Gigabyte Technology Co., Ltd. H97N-WIFI/H97N-WIFI, BIOS F9b 03/03/2016
    Workqueue: zswap-shrink shrink_worker
    RIP: 0010:__free_pages+0x10a/0x130
    Code: c1 e7 06 48 01 ef 45 85 e4 74 d1 44 89 e6 31 d2 41 83 ec 01 e8 e7 b0 ff ff eb da 48 c7 c6 e0 32 91 88 48 89 ef e8 a6 89 f8 ff <0f> 0b 4c 89 e7 e8 fc 79 07 00 e9 33 ff ff ff 48 89 ef e8 ff 79 07
    RSP: 0000:ffff88819a2ffb98 EFLAGS: 00010296
    RAX: 0000000000000000 RBX: ffffea000594c5a8 RCX: 0000000000000000
    RDX: 1ffffd4000b298b7 RSI: 0000000000000000 RDI: ffffea000594c5b8
    RBP: ffffea000594c580 R08: 000000000000003e R09: ffff8881d5520bbb
    R10: ffffed103aaa4177 R11: 0000000000000001 R12: ffffea000594c5b4
    R13: 0000000000000000 R14: ffff888165316000 R15: ffffea000594c588
    FS:  0000000000000000(0000) GS:ffff8881d5500000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f7c8c3654d8 CR3: 0000000103f42004 CR4: 00000000001706e0
    Call Trace:
     z3fold_zpool_shrink+0x9b6/0x1240
     shrink_worker+0x35/0x90
     process_one_work+0x70c/0x1210
     worker_thread+0x539/0x1200
     kthread+0x330/0x400
     ret_from_fork+0x22/0x30
    Modules linked in: rfcomm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ccm algif_aead des_generic libdes ecb algif_skcipher cmac bnep md4 algif_hash af_alg vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iwlmvm hid_logitech_hidpp kvm at24 mac80211 snd_hda_codec_realtek iTCO_wdt snd_hda_codec_generic intel_pmc_bxt snd_hda_codec_hdmi ledtrig_audio iTCO_vendor_support mei_wdt mei_hdcp snd_hda_intel snd_intel_dspcfg libarc4 soundwire_intel irqbypass iwlwifi soundwire_generic_allocation rapl soundwire_cadence intel_cstate snd_hda_codec intel_uncore btusb joydev mousedev snd_usb_audio pcspkr btrtl uvcvideo nouveau btbcm i2c_i801 btintel snd_hda_core videobuf2_vmalloc i2c_smbus snd_usbmidi_lib videobuf2_memops bluetooth snd_hwdep soundwire_bus snd_soc_rt5640 videobuf2_v4l2 cfg80211 snd_soc_rl6231 videobuf2_common snd_rawmidi lpc_ich alx videodev mdio snd_seq_device snd_soc_core mc ecdh_generic mxm_wmi mei_me
     hid_logitech_dj wmi snd_compress e1000e ac97_bus mei ttm rfkill snd_pcm_dmaengine ecc snd_pcm snd_timer snd soundcore mac_hid acpi_pad pkcs8_key_parser it87 hwmon_vid crypto_user fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted tpm rng_core usbhid dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_pci_renesas i915 video intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm agpgart
    ---[ end trace 126d646fc3dc0ad8 ]---

To fix the issue, re-add the earlier test and set in the case where we
have a headless page.

Link: https://lkml.kernel.org/r/c8106dbe6d8390b290cd1d7f873a2942e805349e.1615452048.git.tommyhebb@gmail.com
Fixes: dcf5aedb24 ("z3fold: stricter locking and more careful reclaim")
Signed-off-by: Thomas Hebb <tommyhebb@gmail.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Jongseok Kim <ks77sj@gmail.com>
Cc: Snild Dolkow <snild@sony.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25 09:22:55 -07:00
Tian Tao e818e820c6 mm: set the sleep_mapped to true for zbud and z3fold
zpool driver adds a flag to indicate whether the zpool driver can enter an
atomic context after mapping.  This patch sets it true for z3fold and
zbud.

Link: https://lkml.kernel.org/r/1611035683-12732-3-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 09:41:01 -08:00
Miaohe Lin c457cd96f1 z3fold: simplify the zhdr initialization code in init_z3fold_page()
We can simplify the zhdr initialization by memset() the zhdr first
instead of set struct member to zero one by one.  This would also make
code more compact and clear.

Link: https://lkml.kernel.org/r/20210120085851.16159-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24 13:38:34 -08:00
Miaohe Lin 70ad3196a6 z3fold: remove unused attribute for release_z3fold_page
Since commit dcf5aedb24 ("z3fold: stricter locking and more careful
reclaim"), release_z3fold_page() is used again.  So we can drop the
unused attribute safely.

Link: https://lkml.kernel.org/r/20210120084008.58432-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24 13:38:34 -08:00
Vitaly Wool 135f97fd0c z3fold: remove preempt disabled sections for RT
Replace get_cpu_ptr() with migrate_disable()+this_cpu_ptr() so RT can take
spinlocks that become sleeping locks.

Signed-off-by Mike Galbraith <efault@gmx.de>

Link: https://lkml.kernel.org/r/20201209145151.18994-3-vitaly.wool@konsulko.com
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:45 -08:00
Vitaly Wool dcf5aedb24 z3fold: stricter locking and more careful reclaim
Use temporary slots in reclaim function to avoid possible race when
freeing those.

While at it, make sure we check CLAIMED flag under page lock in the
reclaim function to make sure we are not racing with z3fold_alloc().

Link: https://lkml.kernel.org/r/20201209145151.18994-4-vitaly.wool@konsulko.com
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: <stable@vger.kernel.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:45 -08:00
Vitaly Wool fc5488651c z3fold: simplify freeing slots
Patch series "z3fold: stability / rt fixes".

Address z3fold stability issues under stress load, primarily in the
reclaim and free aspects.  Besides, it fixes the locking problems that
were only seen in real-time kernel configuration.

This patch (of 3):

There used to be two places in the code where slots could be freed, namely
when freeing the last allocated handle from the slots and when releasing
the z3fold header these slots aree linked to.  The logic to decide on
whether to free certain slots was complicated and error prone in both
functions and it led to failures in RT case.

To fix that, make free_handle() the single point of freeing slots.

Link: https://lkml.kernel.org/r/20201209145151.18994-1-vitaly.wool@konsulko.com
Link: https://lkml.kernel.org/r/20201209145151.18994-2-vitaly.wool@konsulko.com
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com>
Tested-by: Mike Galbraith <efault@gmx.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:45 -08:00
Hui Su f94afee998 mm/z3fold.c: use xx_zalloc instead xx_alloc and memset
alloc_slots() allocates memory for slots using kmem_cache_alloc(), then
memsets it.  We can just use kmem_cache_zalloc().

Signed-off-by: Hui Su <sh_def@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200926100834.GA184671@rlk
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:34 -07:00
Qian Cai af4798a5bb mm/z3fold: silence kmemleak false positives of slots
Kmemleak reported many leaks while under memory pressue in,

    slots = alloc_slots(pool, gfp);

which is referenced by "zhdr" in init_z3fold_page(),

    zhdr->slots = slots;

However, "zhdr" could be gone without freeing slots as the later will be
freed separately when the last "handle" off of "handles" array is freed.
It will be within "slots" which is always aligned.

  unreferenced object 0xc000000fdadc1040 (size 104):
  comm "oom04", pid 140476, jiffies 4295359280 (age 3454.970s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    z3fold_zpool_malloc+0x7b0/0xe10
    alloc_slots at mm/z3fold.c:214
    (inlined by) init_z3fold_page at mm/z3fold.c:412
    (inlined by) z3fold_alloc at mm/z3fold.c:1161
    (inlined by) z3fold_zpool_malloc at mm/z3fold.c:1735
    zpool_malloc+0x34/0x50
    zswap_frontswap_store+0x60c/0xda0
    zswap_frontswap_store at mm/zswap.c:1093
    __frontswap_store+0x128/0x330
    swap_writepage+0x58/0x110
    pageout+0x16c/0xa40
    shrink_page_list+0x1ac8/0x25c0
    shrink_inactive_list+0x270/0x730
    shrink_lruvec+0x444/0xf30
    shrink_node+0x2a4/0x9c0
    do_try_to_free_pages+0x158/0x640
    try_to_free_pages+0x1bc/0x5f0
    __alloc_pages_slowpath.constprop.60+0x4dc/0x15a0
    __alloc_pages_nodemask+0x520/0x650
    alloc_pages_vma+0xc0/0x420
    handle_mm_fault+0x1174/0x1bf0

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: http://lkml.kernel.org/r/20200522220052.2225-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-28 11:35:40 -07:00
Uladzislau Rezki d8f117abb3 z3fold: fix use-after-free when freeing handles
free_handle() for a foreign handle may race with inter-page compaction,
what can lead to memory corruption.

To avoid that, take write lock not read lock in free_handle to be
synchronized with __release_z3fold_page().

For example KASAN can detect it:

  ==================================================================
  BUG: KASAN: use-after-free in LZ4_decompress_safe+0x2c4/0x3b8
  Read of size 1 at addr ffffffc976695ca3 by task GoogleApiHandle/4121

  CPU: 0 PID: 4121 Comm: GoogleApiHandle Tainted: P S         OE     4.19.81-perf+ #162
  Hardware name: Sony Mobile Communications. PDX-203(KONA) (DT)
  Call trace:
     LZ4_decompress_safe+0x2c4/0x3b8
     lz4_decompress_crypto+0x3c/0x70
     crypto_decompress+0x58/0x70
     zcomp_decompress+0xd4/0x120
     ...

Apart from that, initialize zhdr->mapped_count in init_z3fold_page() and
remove "newpage" variable because it is not used anywhere.

Signed-off-by: Uladzislau Rezki <uladzislau.rezki@sony.com>
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Raymond Jennings <shentino@gmail.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200520082100.28876-1-vitaly.wool@konsulko.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-23 10:26:32 -07:00
Sebastian Andrzej Siewior a8198fedd9 mm/z3fold.c: do not include rwlock.h directly
rwlock.h should not be included directly. Instead linux/splinlock.h
should be included. One thing it does is to break the RT build.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200224133631.1510569-1-bigeasy@linutronix.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06 07:06:09 -06:00
Vitaly Wool 4a3ac9311d mm/z3fold.c: add inter-page compaction
For each page scheduled for compaction (e.  g.  by z3fold_free()), try to
apply inter-page compaction before running the traditional/ existing
intra-page compaction.  That means, if the page has only one buddy, we
treat that buddy as a new object that we aim to place into an existing
z3fold page.  If such a page is found, that object is transferred and the
old page is freed completely.  The transferred object is named "foreign"
and treated slightly differently thereafter.

Namely, we increase "foreign handle" counter for the new page.  Pages with
non-zero "foreign handle" count become unmovable.  This patch implements
"foreign handle" detection when a handle is freed to decrement the foreign
handle counter accordingly, so a page may as well become movable again as
the time goes by.

As a result, we almost always have exactly 3 objects per page and
significantly better average compression ratio.

[cai@lca.pw: fix -Wunused-but-set-variable warnings]
  Link: http://lkml.kernel.org/r/1570542062-29144-1-git-send-email-cai@lca.pw
[vitalywool@gmail.com: avoid subtle race when freeing slots]
  Link: http://lkml.kernel.org/r/20191127152118.6314b99074b0626d4c5a8835@gmail.com
[vitalywool@gmail.com: compact objects more accurately]
  Link: http://lkml.kernel.org/r/20191127152216.6ad33745a21ba71c53606acb@gmail.com
[vitalywool@gmail.com: protect handle reads]
  Link: http://lkml.kernel.org/r/20191127152345.8059852f60947686674d726d@gmail.com
Link: http://lkml.kernel.org/r/20191006041457.24113-1-vitalywool@gmail.com
Signed-off-by: Vitaly Wool <vitaly.vul@sony.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 12:59:07 -08:00
Vitaly Wool 5b6807de11 mm/z3fold.c: claim page in the beginning of free
There's a really hard to reproduce race in z3fold between z3fold_free()
and z3fold_reclaim_page().  z3fold_reclaim_page() can claim the page
after z3fold_free() has checked if the page was claimed and
z3fold_free() will then schedule this page for compaction which may in
turn lead to random page faults (since that page would have been
reclaimed by then).

Fix that by claiming page in the beginning of z3fold_free() and not
forgetting to clear the claim in the end.

[vitalywool@gmail.com: v2]
  Link: http://lkml.kernel.org/r/20190928113456.152742cf@bigdell
Link: http://lkml.kernel.org/r/20190926104844.4f0c6efa1366b8f5741eaba9@gmail.com
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Reported-by: Markus Linnala <markus.linnala@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Markus Linnala <markus.linnala@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-07 15:47:19 -07:00
Vitaly Wool 63398413c0 z3fold: fix memory leak in kmem cache
Currently there is a leak in init_z3fold_page() -- it allocates handles
from kmem cache even for headless pages, but then they are never used and
never freed, so eventually kmem cache may get exhausted.  This patch
provides a fix for that.

Link: http://lkml.kernel.org/r/20190917185352.44cf285d3ebd9e64548de5de@gmail.com
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Reported-by: Markus Linnala <markus.linnala@gmail.com>
Tested-by: Markus Linnala <markus.linnala@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 15:54:10 -07:00
Vitaly Wool 3f9d2b5766 z3fold: fix retry mechanism in page reclaim
z3fold_page_reclaim()'s retry mechanism is broken: on a second iteration
it will have zhdr from the first one so that zhdr is no longer in line
with struct page.  That leads to crashes when the system is stressed.

Fix that by moving zhdr assignment up.

While at it, protect against using already freed handles by using own
local slots structure in z3fold_page_reclaim().

Link: http://lkml.kernel.org/r/20190908162919.830388dc7404d1e2c80f4095@gmail.com
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Reported-by: Markus Linnala <markus.linnala@gmail.com>
Reported-by: Chris Murphy <bugzilla@colorremedies.com>
Reported-by: Agustin Dall'Alba <agustin@dallalba.com.ar>
Cc: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 15:54:06 -07:00
Vitaly Wool 6e73fd25e2 Revert "mm/z3fold.c: fix race between migration and destruction"
With the original commit applied, z3fold_zpool_destroy() may get blocked
on wait_event() for indefinite time.  Revert this commit for the time
being to get rid of this problem since the issue the original commit
addresses is less severe.

Link: http://lkml.kernel.org/r/20190910123142.7a9c8d2de4d0acbc0977c602@gmail.com
Fixes: d776aaa989 ("mm/z3fold.c: fix race between migration and destruction")
Reported-by: Agustín Dall'Alba <agustin@dallalba.com.ar>
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 15:54:06 -07:00
Gustavo A. R. Silva 14108b9131 mm/z3fold.c: fix lock/unlock imbalance in z3fold_page_isolate
Fix lock/unlock imbalance by unlocking *zhdr* before return.

Addresses Coverity ID 1452811 ("Missing unlock")

Link: http://lkml.kernel.org/r/20190826030634.GA4379@embeddedor
Fixes: d776aaa989 ("mm/z3fold.c: fix race between migration and destruction")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-30 18:00:50 -07:00
Henry Burns d776aaa989 mm/z3fold.c: fix race between migration and destruction
In z3fold_destroy_pool() we call destroy_workqueue(&pool->compact_wq).
However, we have no guarantee that migration isn't happening in the
background at that time.

Migration directly calls queue_work_on(pool->compact_wq), if destruction
wins that race we are using a destroyed workqueue.

Link: http://lkml.kernel.org/r/20190809213828.202833-1-henryburns@google.com
Signed-off-by: Henry Burns <henryburns@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Henry Burns b997052bc3 mm/z3fold.c: fix z3fold_destroy_pool() race condition
The constraint from the zpool use of z3fold_destroy_pool() is there are
no outstanding handles to memory (so no active allocations), but it is
possible for there to be outstanding work on either of the two wqs in
the pool.

Calling z3fold_deregister_migration() before the workqueues are drained
means that there can be allocated pages referencing a freed inode,
causing any thread in compaction to be able to trip over the bad pointer
in PageMovable().

Link: http://lkml.kernel.org/r/20190726224810.79660-2-henryburns@google.com
Fixes: 1f862989b0 ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Jonathan Adams <jwadams@google.com>
Cc: Vitaly Vul <vitaly.vul@sony.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13 16:06:52 -07:00
Henry Burns 6051d3bd3b mm/z3fold.c: fix z3fold_destroy_pool() ordering
The constraint from the zpool use of z3fold_destroy_pool() is there are
no outstanding handles to memory (so no active allocations), but it is
possible for there to be outstanding work on either of the two wqs in
the pool.

If there is work queued on pool->compact_workqueue when it is called,
z3fold_destroy_pool() will do:

   z3fold_destroy_pool()
     destroy_workqueue(pool->release_wq)
     destroy_workqueue(pool->compact_wq)
       drain_workqueue(pool->compact_wq)
         do_compact_page(zhdr)
           kref_put(&zhdr->refcount)
             __release_z3fold_page(zhdr, ...)
               queue_work_on(pool->release_wq, &pool->work) *BOOM*

So compact_wq needs to be destroyed before release_wq.

Link: http://lkml.kernel.org/r/20190726224810.79660-1-henryburns@google.com
Fixes: 5d03a66139 ("mm/z3fold.c: use kref to prevent page free/compact race")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Jonathan Adams <jwadams@google.com>
Cc: Vitaly Vul <vitaly.vul@sony.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13 16:06:52 -07:00
Linus Torvalds 933a90bf4f Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount updates from Al Viro:
 "The first part of mount updates.

  Convert filesystems to use the new mount API"

* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  mnt_init(): call shmem_init() unconditionally
  constify ksys_mount() string arguments
  don't bother with registering rootfs
  init_rootfs(): don't bother with init_ramfs_fs()
  vfs: Convert smackfs to use the new mount API
  vfs: Convert selinuxfs to use the new mount API
  vfs: Convert securityfs to use the new mount API
  vfs: Convert apparmorfs to use the new mount API
  vfs: Convert openpromfs to use the new mount API
  vfs: Convert xenfs to use the new mount API
  vfs: Convert gadgetfs to use the new mount API
  vfs: Convert oprofilefs to use the new mount API
  vfs: Convert ibmasmfs to use the new mount API
  vfs: Convert qib_fs/ipathfs to use the new mount API
  vfs: Convert efivarfs to use the new mount API
  vfs: Convert configfs to use the new mount API
  vfs: Convert binfmt_misc to use the new mount API
  convenience helper: get_tree_single()
  convenience helper get_tree_nodev()
  vfs: Kill sget_userns()
  ...
2019-07-19 10:42:02 -07:00
Henry Burns c92d2f3856 mm/z3fold.c: reinitialize zhdr structs after migration
z3fold_page_migration() calls memcpy(new_zhdr, zhdr, PAGE_SIZE).
However, zhdr contains fields that can't be directly coppied over (ex:
list_head, a circular linked list).  We only need to initialize the
linked lists in new_zhdr, as z3fold_isolate_page() already ensures that
these lists are empty

Additionally it is possible that zhdr->work has been placed in a
workqueue.  In this case we shouldn't migrate the page, as zhdr->work
references zhdr as opposed to new_zhdr.

Link: http://lkml.kernel.org/r/20190716000520.230595-1-henryburns@google.com
Fixes: 1f862989b0 ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Vitaly Vul <vitaly.vul@sony.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Jonathan Adams <jwadams@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:21 -07:00