2018-09-12 09:16:07 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* fs/f2fs/segment.c
|
|
|
|
*
|
|
|
|
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
|
|
|
|
* http://www.samsung.com/
|
|
|
|
*/
|
|
|
|
#include <linux/fs.h>
|
|
|
|
#include <linux/f2fs_fs.h>
|
|
|
|
#include <linux/bio.h>
|
|
|
|
#include <linux/blkdev.h>
|
mm: introduce memalloc_retry_wait()
Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:
- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.
Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.
It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.
This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.
For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.
linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.
Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 06:07:14 +08:00
|
|
|
#include <linux/sched/mm.h>
|
2012-12-20 05:19:30 +08:00
|
|
|
#include <linux/prefetch.h>
|
2014-04-02 14:34:36 +08:00
|
|
|
#include <linux/kthread.h>
|
2013-11-22 09:09:59 +08:00
|
|
|
#include <linux/swap.h>
|
2015-10-06 05:49:57 +08:00
|
|
|
#include <linux/timer.h>
|
2017-05-18 01:36:58 +08:00
|
|
|
#include <linux/freezer.h>
|
2017-09-10 03:03:23 +08:00
|
|
|
#include <linux/sched/signal.h>
|
2021-09-30 02:12:03 +08:00
|
|
|
#include <linux/random.h>
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
#include "f2fs.h"
|
|
|
|
#include "segment.h"
|
|
|
|
#include "node.h"
|
2017-08-16 12:27:19 +08:00
|
|
|
#include "gc.h"
|
2021-08-20 11:52:28 +08:00
|
|
|
#include "iostat.h"
|
2013-04-23 16:51:43 +08:00
|
|
|
#include <trace/events/f2fs.h>
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2013-11-15 09:42:51 +08:00
|
|
|
#define __reverse_ffz(x) __reverse_ffs(~(x))
|
|
|
|
|
2013-11-15 12:55:58 +08:00
|
|
|
static struct kmem_cache *discard_entry_slab;
|
2017-01-10 06:13:03 +08:00
|
|
|
static struct kmem_cache *discard_cmd_slab;
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
static struct kmem_cache *sit_entry_set_slab;
|
2022-04-29 02:18:09 +08:00
|
|
|
static struct kmem_cache *revoke_entry_slab;
|
2013-11-15 12:55:58 +08:00
|
|
|
|
2015-10-21 06:17:19 +08:00
|
|
|
static unsigned long __reverse_ulong(unsigned char *str)
|
|
|
|
{
|
|
|
|
unsigned long tmp = 0;
|
|
|
|
int shift = 24, idx = 0;
|
|
|
|
|
|
|
|
#if BITS_PER_LONG == 64
|
|
|
|
shift = 56;
|
|
|
|
#endif
|
|
|
|
while (shift >= 0) {
|
|
|
|
tmp |= (unsigned long)str[idx++] << shift;
|
|
|
|
shift -= BITS_PER_BYTE;
|
|
|
|
}
|
|
|
|
return tmp;
|
|
|
|
}
|
|
|
|
|
2013-11-15 09:42:51 +08:00
|
|
|
/*
|
|
|
|
* __reverse_ffs is copied from include/asm-generic/bitops/__ffs.h since
|
|
|
|
* MSB and LSB are reversed in a byte by f2fs_set_bit.
|
|
|
|
*/
|
|
|
|
static inline unsigned long __reverse_ffs(unsigned long word)
|
|
|
|
{
|
|
|
|
int num = 0;
|
|
|
|
|
|
|
|
#if BITS_PER_LONG == 64
|
2015-10-21 06:17:19 +08:00
|
|
|
if ((word & 0xffffffff00000000UL) == 0)
|
2013-11-15 09:42:51 +08:00
|
|
|
num += 32;
|
2015-10-21 06:17:19 +08:00
|
|
|
else
|
2013-11-15 09:42:51 +08:00
|
|
|
word >>= 32;
|
|
|
|
#endif
|
2015-10-21 06:17:19 +08:00
|
|
|
if ((word & 0xffff0000) == 0)
|
2013-11-15 09:42:51 +08:00
|
|
|
num += 16;
|
2015-10-21 06:17:19 +08:00
|
|
|
else
|
2013-11-15 09:42:51 +08:00
|
|
|
word >>= 16;
|
2015-10-21 06:17:19 +08:00
|
|
|
|
|
|
|
if ((word & 0xff00) == 0)
|
2013-11-15 09:42:51 +08:00
|
|
|
num += 8;
|
2015-10-21 06:17:19 +08:00
|
|
|
else
|
2013-11-15 09:42:51 +08:00
|
|
|
word >>= 8;
|
2015-10-21 06:17:19 +08:00
|
|
|
|
2013-11-15 09:42:51 +08:00
|
|
|
if ((word & 0xf0) == 0)
|
|
|
|
num += 4;
|
|
|
|
else
|
|
|
|
word >>= 4;
|
2015-10-21 06:17:19 +08:00
|
|
|
|
2013-11-15 09:42:51 +08:00
|
|
|
if ((word & 0xc) == 0)
|
|
|
|
num += 2;
|
|
|
|
else
|
|
|
|
word >>= 2;
|
2015-10-21 06:17:19 +08:00
|
|
|
|
2013-11-15 09:42:51 +08:00
|
|
|
if ((word & 0x2) == 0)
|
|
|
|
num += 1;
|
|
|
|
return num;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2014-08-06 22:22:50 +08:00
|
|
|
* __find_rev_next(_zero)_bit is copied from lib/find_next_bit.c because
|
2013-11-15 09:42:51 +08:00
|
|
|
* f2fs_set_bit makes MSB and LSB reversed in a byte.
|
2015-11-12 08:43:04 +08:00
|
|
|
* @size must be integral times of unsigned long.
|
2013-11-15 09:42:51 +08:00
|
|
|
* Example:
|
2015-10-21 06:17:19 +08:00
|
|
|
* MSB <--> LSB
|
|
|
|
* f2fs_set_bit(0, bitmap) => 1000 0000
|
|
|
|
* f2fs_set_bit(7, bitmap) => 0000 0001
|
2013-11-15 09:42:51 +08:00
|
|
|
*/
|
|
|
|
static unsigned long __find_rev_next_bit(const unsigned long *addr,
|
|
|
|
unsigned long size, unsigned long offset)
|
|
|
|
{
|
|
|
|
const unsigned long *p = addr + BIT_WORD(offset);
|
2015-11-12 08:43:04 +08:00
|
|
|
unsigned long result = size;
|
2013-11-15 09:42:51 +08:00
|
|
|
unsigned long tmp;
|
|
|
|
|
|
|
|
if (offset >= size)
|
|
|
|
return size;
|
|
|
|
|
2015-11-12 08:43:04 +08:00
|
|
|
size -= (offset & ~(BITS_PER_LONG - 1));
|
2013-11-15 09:42:51 +08:00
|
|
|
offset %= BITS_PER_LONG;
|
2015-10-21 06:17:19 +08:00
|
|
|
|
2015-11-12 08:43:04 +08:00
|
|
|
while (1) {
|
|
|
|
if (*p == 0)
|
|
|
|
goto pass;
|
2013-11-15 09:42:51 +08:00
|
|
|
|
2015-10-21 06:17:19 +08:00
|
|
|
tmp = __reverse_ulong((unsigned char *)p);
|
2015-11-12 08:43:04 +08:00
|
|
|
|
|
|
|
tmp &= ~0UL >> offset;
|
|
|
|
if (size < BITS_PER_LONG)
|
|
|
|
tmp &= (~0UL << (BITS_PER_LONG - size));
|
2013-11-15 09:42:51 +08:00
|
|
|
if (tmp)
|
2015-11-12 08:43:04 +08:00
|
|
|
goto found;
|
|
|
|
pass:
|
|
|
|
if (size <= BITS_PER_LONG)
|
|
|
|
break;
|
2013-11-15 09:42:51 +08:00
|
|
|
size -= BITS_PER_LONG;
|
2015-11-12 08:43:04 +08:00
|
|
|
offset = 0;
|
2015-10-21 06:17:19 +08:00
|
|
|
p++;
|
2013-11-15 09:42:51 +08:00
|
|
|
}
|
2015-11-12 08:43:04 +08:00
|
|
|
return result;
|
|
|
|
found:
|
|
|
|
return result - size + __reverse_ffs(tmp);
|
2013-11-15 09:42:51 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static unsigned long __find_rev_next_zero_bit(const unsigned long *addr,
|
|
|
|
unsigned long size, unsigned long offset)
|
|
|
|
{
|
|
|
|
const unsigned long *p = addr + BIT_WORD(offset);
|
2015-12-05 08:51:13 +08:00
|
|
|
unsigned long result = size;
|
2013-11-15 09:42:51 +08:00
|
|
|
unsigned long tmp;
|
|
|
|
|
|
|
|
if (offset >= size)
|
|
|
|
return size;
|
|
|
|
|
2015-12-05 08:51:13 +08:00
|
|
|
size -= (offset & ~(BITS_PER_LONG - 1));
|
2013-11-15 09:42:51 +08:00
|
|
|
offset %= BITS_PER_LONG;
|
2015-12-05 08:51:13 +08:00
|
|
|
|
|
|
|
while (1) {
|
|
|
|
if (*p == ~0UL)
|
|
|
|
goto pass;
|
|
|
|
|
2015-10-21 06:17:19 +08:00
|
|
|
tmp = __reverse_ulong((unsigned char *)p);
|
2015-12-05 08:51:13 +08:00
|
|
|
|
|
|
|
if (offset)
|
|
|
|
tmp |= ~0UL << (BITS_PER_LONG - offset);
|
|
|
|
if (size < BITS_PER_LONG)
|
|
|
|
tmp |= ~0UL >> size;
|
2015-10-21 06:17:19 +08:00
|
|
|
if (tmp != ~0UL)
|
2015-12-05 08:51:13 +08:00
|
|
|
goto found;
|
|
|
|
pass:
|
|
|
|
if (size <= BITS_PER_LONG)
|
|
|
|
break;
|
2013-11-15 09:42:51 +08:00
|
|
|
size -= BITS_PER_LONG;
|
2015-12-05 08:51:13 +08:00
|
|
|
offset = 0;
|
2015-10-21 06:17:19 +08:00
|
|
|
p++;
|
2013-11-15 09:42:51 +08:00
|
|
|
}
|
2015-12-05 08:51:13 +08:00
|
|
|
return result;
|
|
|
|
found:
|
|
|
|
return result - size + __reverse_ffz(tmp);
|
2013-11-15 09:42:51 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
bool f2fs_need_SSR(struct f2fs_sb_info *sbi)
|
2017-09-10 02:11:04 +08:00
|
|
|
{
|
|
|
|
int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
|
|
|
|
int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
|
|
|
|
int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
|
|
|
|
|
2020-02-14 17:44:12 +08:00
|
|
|
if (f2fs_lfs_mode(sbi))
|
2017-09-10 02:11:04 +08:00
|
|
|
return false;
|
2020-07-02 12:14:14 +08:00
|
|
|
if (sbi->gc_mode == GC_URGENT_HIGH)
|
2017-09-10 02:11:04 +08:00
|
|
|
return true;
|
2018-08-21 10:21:43 +08:00
|
|
|
if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
|
|
|
|
return true;
|
2017-09-10 02:11:04 +08:00
|
|
|
|
|
|
|
return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs +
|
2017-10-28 16:52:33 +08:00
|
|
|
SM_I(sbi)->min_ssr_sections + reserved_sections(sbi));
|
2017-09-10 02:11:04 +08:00
|
|
|
}
|
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
void f2fs_abort_atomic_write(struct inode *inode, bool clean)
|
2014-10-07 08:39:50 +08:00
|
|
|
{
|
2022-04-29 02:18:09 +08:00
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
2015-08-07 18:42:09 +08:00
|
|
|
|
2022-08-04 21:38:21 +08:00
|
|
|
if (!f2fs_is_atomic_file(inode))
|
|
|
|
return;
|
2015-03-18 08:58:08 +08:00
|
|
|
|
2022-08-04 21:38:21 +08:00
|
|
|
release_atomic_write_cnt(inode);
|
2022-11-01 03:24:15 +08:00
|
|
|
clear_inode_flag(inode, FI_ATOMIC_COMMITTED);
|
2022-11-12 01:04:06 +08:00
|
|
|
clear_inode_flag(inode, FI_ATOMIC_REPLACE);
|
2022-08-04 21:38:21 +08:00
|
|
|
clear_inode_flag(inode, FI_ATOMIC_FILE);
|
2022-10-04 09:41:02 +08:00
|
|
|
stat_dec_atomic_inode(inode);
|
2022-11-01 03:24:15 +08:00
|
|
|
|
2023-01-09 11:44:50 +08:00
|
|
|
F2FS_I(inode)->atomic_write_task = NULL;
|
|
|
|
|
2022-11-01 03:24:15 +08:00
|
|
|
if (clean) {
|
|
|
|
truncate_inode_pages_final(inode->i_mapping);
|
|
|
|
f2fs_i_size_write(inode, fi->original_i_size);
|
2023-01-09 11:44:50 +08:00
|
|
|
fi->original_i_size = 0;
|
2022-11-01 03:24:15 +08:00
|
|
|
}
|
2014-10-07 08:39:50 +08:00
|
|
|
}
|
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
static int __replace_atomic_write_block(struct inode *inode, pgoff_t index,
|
|
|
|
block_t new_addr, block_t *old_addr, bool recover)
|
2016-02-06 14:38:29 +08:00
|
|
|
{
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
2022-04-29 02:18:09 +08:00
|
|
|
struct dnode_of_data dn;
|
|
|
|
struct node_info ni;
|
|
|
|
int err;
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
|
2017-08-08 19:09:08 +08:00
|
|
|
retry:
|
2022-04-29 02:18:09 +08:00
|
|
|
set_new_dnode(&dn, inode, NULL, NULL, 0);
|
2023-04-19 01:42:01 +08:00
|
|
|
err = f2fs_get_dnode_of_data(&dn, index, ALLOC_NODE);
|
2022-04-29 02:18:09 +08:00
|
|
|
if (err) {
|
|
|
|
if (err == -ENOMEM) {
|
|
|
|
f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
|
|
|
|
goto retry;
|
2018-07-27 18:15:16 +08:00
|
|
|
}
|
2022-04-29 02:18:09 +08:00
|
|
|
return err;
|
2016-02-06 14:38:29 +08:00
|
|
|
}
|
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
err = f2fs_get_node_info(sbi, dn.nid, &ni, false);
|
|
|
|
if (err) {
|
|
|
|
f2fs_put_dnode(&dn);
|
|
|
|
return err;
|
2017-10-19 10:05:57 +08:00
|
|
|
}
|
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
if (recover) {
|
|
|
|
/* dn.data_blkaddr is always valid */
|
|
|
|
if (!__is_valid_data_blkaddr(new_addr)) {
|
|
|
|
if (new_addr == NULL_ADDR)
|
|
|
|
dec_valid_block_count(sbi, inode, 1);
|
|
|
|
f2fs_invalidate_blocks(sbi, dn.data_blkaddr);
|
|
|
|
f2fs_update_data_blkaddr(&dn, new_addr);
|
|
|
|
} else {
|
|
|
|
f2fs_replace_block(sbi, &dn, dn.data_blkaddr,
|
|
|
|
new_addr, ni.version, true, true);
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
}
|
2022-04-29 02:18:09 +08:00
|
|
|
} else {
|
|
|
|
blkcnt_t count = 1;
|
2016-02-06 14:38:29 +08:00
|
|
|
|
2023-04-05 22:45:36 +08:00
|
|
|
err = inc_valid_block_count(sbi, inode, &count);
|
|
|
|
if (err) {
|
|
|
|
f2fs_put_dnode(&dn);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
*old_addr = dn.data_blkaddr;
|
|
|
|
f2fs_truncate_data_blocks_range(&dn, 1);
|
|
|
|
dec_valid_block_count(sbi, F2FS_I(inode)->cow_inode, count);
|
2023-04-05 22:45:36 +08:00
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
f2fs_replace_block(sbi, &dn, dn.data_blkaddr, new_addr,
|
|
|
|
ni.version, true, false);
|
|
|
|
}
|
f2fs: Fix a hungtask problem in atomic write
In the cache writing process, if it is an atomic file, increase the page
count of F2FS_WB_CP_DATA, otherwise increase the page count of
F2FS_WB_DATA.
When you step into the hook branch due to insufficient memory in
f2fs_write_begin, f2fs_drop_inmem_pages_all will be called to traverse
all atomic inodes and clear the FI_ATOMIC_FILE mark of all atomic files.
In f2fs_drop_inmem_pages,first acquire the inmem_lock , revoke all the
inmem_pages, and then clear the FI_ATOMIC_FILE mark. Before this mark is
cleared, other threads may hold inmem_lock to add inmem_pages to the inode
that has just been emptied inmem_pages, and increase the page count of
F2FS_WB_CP_DATA.
When the IO returns, it is found that the FI_ATOMIC_FILE flag is cleared
by f2fs_drop_inmem_pages_all, and f2fs_is_atomic_file returns false,which
causes the page count of F2FS_WB_DATA to be decremented. The page count of
F2FS_WB_CP_DATA cannot be cleared. Finally, hungtask is triggered in
f2fs_wait_on_all_pages because get_pages will never return zero.
process A: process B:
f2fs_drop_inmem_pages_all
->f2fs_drop_inmem_pages of inode#1
->mutex_lock(&fi->inmem_lock)
->__revoke_inmem_pages of inode#1 f2fs_ioc_commit_atomic_write
->mutex_unlock(&fi->inmem_lock) ->f2fs_commit_inmem_pages of inode#1
->mutex_lock(&fi->inmem_lock)
->__f2fs_commit_inmem_pages
->f2fs_do_write_data_page
->f2fs_outplace_write_data
->do_write_page
->f2fs_submit_page_write
->inc_page_count(sbi, F2FS_WB_CP_DATA )
->mutex_unlock(&fi->inmem_lock)
->spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
->clear_inode_flag(inode, FI_ATOMIC_FILE)
->spin_unlock(&sbi->inode_lock[ATOMIC_FILE])
f2fs_write_end_io
->dec_page_count(sbi, F2FS_WB_DATA );
We can fix the problem by putting the action of clearing the FI_ATOMIC_FILE
mark into the inmem_lock lock. This operation can ensure that no one will
submit the inmem pages before the FI_ATOMIC_FILE mark is cleared, so that
there will be no atomic writes waiting for writeback.
Fixes: 57864ae5ce3a ("f2fs: limit # of inmemory pages")
Signed-off-by: Yi Zhuang <zhuangyi1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-03-31 17:34:14 +08:00
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
f2fs_put_dnode(&dn);
|
2023-01-09 11:44:49 +08:00
|
|
|
|
|
|
|
trace_f2fs_replace_atomic_write_block(inode, F2FS_I(inode)->cow_inode,
|
2023-04-04 00:37:24 +08:00
|
|
|
index, old_addr ? *old_addr : 0, new_addr, recover);
|
2022-04-29 02:18:09 +08:00
|
|
|
return 0;
|
2016-02-06 14:38:29 +08:00
|
|
|
}
|
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
static void __complete_revoke_list(struct inode *inode, struct list_head *head,
|
|
|
|
bool revoke)
|
2017-03-17 09:55:52 +08:00
|
|
|
{
|
2022-04-29 02:18:09 +08:00
|
|
|
struct revoke_entry *cur, *tmp;
|
2023-02-15 07:53:52 +08:00
|
|
|
pgoff_t start_index = 0;
|
2022-11-12 01:04:06 +08:00
|
|
|
bool truncate = is_inode_flag_set(inode, FI_ATOMIC_REPLACE);
|
2017-03-17 09:55:52 +08:00
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
list_for_each_entry_safe(cur, tmp, head, list) {
|
2023-02-15 07:53:52 +08:00
|
|
|
if (revoke) {
|
2022-04-29 02:18:09 +08:00
|
|
|
__replace_atomic_write_block(inode, cur->index,
|
|
|
|
cur->old_addr, NULL, true);
|
2023-02-15 07:53:52 +08:00
|
|
|
} else if (truncate) {
|
|
|
|
f2fs_truncate_hole(inode, start_index, cur->index);
|
|
|
|
start_index = cur->index + 1;
|
|
|
|
}
|
2022-11-12 01:04:06 +08:00
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
list_del(&cur->list);
|
|
|
|
kmem_cache_free(revoke_entry_slab, cur);
|
2017-03-17 09:55:52 +08:00
|
|
|
}
|
2022-11-12 01:04:06 +08:00
|
|
|
|
|
|
|
if (!revoke && truncate)
|
2023-02-15 07:53:52 +08:00
|
|
|
f2fs_do_truncate_blocks(inode, start_index * PAGE_SIZE, false);
|
2017-03-17 09:55:52 +08:00
|
|
|
}
|
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
static int __f2fs_commit_atomic_write(struct inode *inode)
|
2014-10-07 08:39:50 +08:00
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
2022-04-29 02:18:09 +08:00
|
|
|
struct inode *cow_inode = fi->cow_inode;
|
|
|
|
struct revoke_entry *new;
|
2018-04-23 10:36:14 +08:00
|
|
|
struct list_head revoke_list;
|
2022-04-29 02:18:09 +08:00
|
|
|
block_t blkaddr;
|
|
|
|
struct dnode_of_data dn;
|
|
|
|
pgoff_t len = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
|
|
|
|
pgoff_t off = 0, blen, index;
|
|
|
|
int ret = 0, i;
|
2014-10-07 08:39:50 +08:00
|
|
|
|
2018-04-23 10:36:14 +08:00
|
|
|
INIT_LIST_HEAD(&revoke_list);
|
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
while (len) {
|
|
|
|
blen = min_t(pgoff_t, ADDRS_PER_BLOCK(cow_inode), len);
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
set_new_dnode(&dn, cow_inode, NULL, NULL, 0);
|
|
|
|
ret = f2fs_get_dnode_of_data(&dn, off, LOOKUP_NODE_RA);
|
|
|
|
if (ret && ret != -ENOENT) {
|
|
|
|
goto out;
|
|
|
|
} else if (ret == -ENOENT) {
|
|
|
|
ret = 0;
|
|
|
|
if (dn.max_level == 0)
|
|
|
|
goto out;
|
|
|
|
goto next;
|
|
|
|
}
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
blen = min((pgoff_t)ADDRS_PER_PAGE(dn.node_page, cow_inode),
|
|
|
|
len);
|
|
|
|
index = off;
|
|
|
|
for (i = 0; i < blen; i++, dn.ofs_in_node++, index++) {
|
|
|
|
blkaddr = f2fs_data_blkaddr(&dn);
|
2018-12-12 18:12:30 +08:00
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
if (!__is_valid_data_blkaddr(blkaddr)) {
|
|
|
|
continue;
|
|
|
|
} else if (!f2fs_is_valid_blkaddr(sbi, blkaddr,
|
|
|
|
DATA_GENERIC_ENHANCE)) {
|
|
|
|
f2fs_put_dnode(&dn);
|
|
|
|
ret = -EFSCORRUPTED;
|
2022-09-28 23:38:54 +08:00
|
|
|
f2fs_handle_error(sbi,
|
|
|
|
ERROR_INVALID_BLKADDR);
|
2022-04-29 02:18:09 +08:00
|
|
|
goto out;
|
2014-12-11 05:59:33 +08:00
|
|
|
}
|
2016-02-06 14:38:29 +08:00
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
new = f2fs_kmem_cache_alloc(revoke_entry_slab, GFP_NOFS,
|
|
|
|
true, NULL);
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
ret = __replace_atomic_write_block(inode, index, blkaddr,
|
|
|
|
&new->old_addr, false);
|
|
|
|
if (ret) {
|
|
|
|
f2fs_put_dnode(&dn);
|
|
|
|
kmem_cache_free(revoke_entry_slab, new);
|
|
|
|
goto out;
|
|
|
|
}
|
2018-04-23 10:36:14 +08:00
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
f2fs_update_data_blkaddr(&dn, NULL_ADDR);
|
|
|
|
new->index = index;
|
|
|
|
list_add_tail(&new->list, &revoke_list);
|
|
|
|
}
|
|
|
|
f2fs_put_dnode(&dn);
|
|
|
|
next:
|
|
|
|
off += blen;
|
|
|
|
len -= blen;
|
2018-04-23 10:36:14 +08:00
|
|
|
}
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
out:
|
2022-11-01 03:24:15 +08:00
|
|
|
if (ret) {
|
2022-07-19 07:02:48 +08:00
|
|
|
sbi->revoked_atomic_block += fi->atomic_write_cnt;
|
2022-11-01 03:24:15 +08:00
|
|
|
} else {
|
2022-07-19 07:02:48 +08:00
|
|
|
sbi->committed_atomic_block += fi->atomic_write_cnt;
|
2022-11-01 03:24:15 +08:00
|
|
|
set_inode_flag(inode, FI_ATOMIC_COMMITTED);
|
|
|
|
}
|
2022-07-19 07:02:48 +08:00
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
__complete_revoke_list(inode, &revoke_list, ret ? true : false);
|
|
|
|
|
|
|
|
return ret;
|
2016-02-06 14:38:29 +08:00
|
|
|
}
|
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
int f2fs_commit_atomic_write(struct inode *inode)
|
2016-02-06 14:38:29 +08:00
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
int err;
|
2016-02-06 14:38:29 +08:00
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
err = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
|
|
|
|
if (err)
|
|
|
|
return err;
|
2016-02-06 14:38:29 +08:00
|
|
|
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_down_write(&fi->i_gc_rwsem[WRITE]);
|
2018-07-25 11:11:56 +08:00
|
|
|
f2fs_lock_op(sbi);
|
2014-10-07 08:39:50 +08:00
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
err = __f2fs_commit_atomic_write(inode);
|
2017-01-07 18:50:26 +08:00
|
|
|
|
2016-02-06 14:38:29 +08:00
|
|
|
f2fs_unlock_op(sbi);
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_up_write(&fi->i_gc_rwsem[WRITE]);
|
2018-07-25 11:11:56 +08:00
|
|
|
|
2015-07-25 15:52:52 +08:00
|
|
|
return err;
|
2014-10-07 08:39:50 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* This function balances dirty node and dentry pages.
|
|
|
|
* In addition, it controls garbage collection.
|
|
|
|
*/
|
2016-01-08 06:15:04 +08:00
|
|
|
void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2022-12-21 02:39:04 +08:00
|
|
|
if (time_to_inject(sbi, FAULT_CHECKPOINT))
|
2022-09-28 23:38:53 +08:00
|
|
|
f2fs_stop_checkpoint(sbi, false, STOP_CP_REASON_FAULT_INJECT);
|
2016-09-26 19:45:55 +08:00
|
|
|
|
2016-06-03 06:24:24 +08:00
|
|
|
/* balance_fs_bg is able to be pending */
|
2017-04-21 04:51:57 +08:00
|
|
|
if (need && excess_cached_nats(sbi))
|
2020-03-19 19:57:58 +08:00
|
|
|
f2fs_balance_fs_bg(sbi, false);
|
2016-06-03 06:24:24 +08:00
|
|
|
|
2019-08-23 17:58:36 +08:00
|
|
|
if (!f2fs_is_checkpoint_ready(sbi))
|
2018-08-21 10:21:43 +08:00
|
|
|
return;
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/*
|
2012-12-21 16:20:21 +08:00
|
|
|
* We should do GC or end up with checkpoint, if there are so many dirty
|
|
|
|
* dir/node pages without enough free segments.
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
*/
|
2023-04-14 00:59:51 +08:00
|
|
|
if (has_enough_free_secs(sbi, 0, 0))
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (test_opt(sbi, GC_MERGE) && sbi->gc_thread &&
|
|
|
|
sbi->gc_thread->f2fs_gc_task) {
|
|
|
|
DEFINE_WAIT(wait);
|
|
|
|
|
|
|
|
prepare_to_wait(&sbi->gc_thread->fggc_wq, &wait,
|
|
|
|
TASK_UNINTERRUPTIBLE);
|
|
|
|
wake_up(&sbi->gc_thread->gc_wait_queue_head);
|
|
|
|
io_schedule();
|
|
|
|
finish_wait(&sbi->gc_thread->fggc_wq, &wait);
|
|
|
|
} else {
|
|
|
|
struct f2fs_gc_control gc_control = {
|
|
|
|
.victim_segno = NULL_SEGNO,
|
|
|
|
.init_gc_type = BG_GC,
|
|
|
|
.no_bg_gc = true,
|
|
|
|
.should_migrate_blocks = false,
|
|
|
|
.err_gc_skipped = false,
|
|
|
|
.nr_free_secs = 1 };
|
|
|
|
f2fs_down_write(&sbi->gc_lock);
|
|
|
|
f2fs_gc(sbi, &gc_control);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-09-16 17:09:03 +08:00
|
|
|
static inline bool excess_dirty_threshold(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2022-01-08 04:48:44 +08:00
|
|
|
int factor = f2fs_rwsem_is_locked(&sbi->cp_rwsem) ? 3 : 2;
|
2021-09-16 17:09:03 +08:00
|
|
|
unsigned int dents = get_pages(sbi, F2FS_DIRTY_DENTS);
|
|
|
|
unsigned int qdata = get_pages(sbi, F2FS_DIRTY_QDATA);
|
|
|
|
unsigned int nodes = get_pages(sbi, F2FS_DIRTY_NODES);
|
|
|
|
unsigned int meta = get_pages(sbi, F2FS_DIRTY_META);
|
|
|
|
unsigned int imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
|
|
|
|
unsigned int threshold = sbi->blocks_per_seg * factor *
|
|
|
|
DEFAULT_DIRTY_THRESHOLD;
|
|
|
|
unsigned int global_threshold = threshold * 3 / 2;
|
|
|
|
|
|
|
|
if (dents >= threshold || qdata >= threshold ||
|
|
|
|
nodes >= threshold || meta >= threshold ||
|
|
|
|
imeta >= threshold)
|
|
|
|
return true;
|
|
|
|
return dents + qdata + nodes + meta + imeta > global_threshold;
|
|
|
|
}
|
|
|
|
|
2020-03-19 19:57:58 +08:00
|
|
|
void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi, bool from_bg)
|
2013-10-24 13:19:18 +08:00
|
|
|
{
|
2018-05-26 18:03:34 +08:00
|
|
|
if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
|
|
|
|
return;
|
|
|
|
|
f2fs: enable rb-tree extent cache
This patch enables rb-tree based extent cache in f2fs.
When we mount with "-o extent_cache", f2fs will try to add recently accessed
page-block mappings into rb-tree based extent cache as much as possible, instead
of original one extent info cache.
By this way, f2fs can support more effective cache between dnode page cache and
disk. It will supply high hit ratio in the cache with fewer memory when dnode
page cache are reclaimed in environment of low memory.
Storage: Sandisk sd card 64g
1.append write file (offset: 0, size: 128M);
2.override write file (offset: 2M, size: 1M);
3.override write file (offset: 4M, size: 1M);
...
4.override write file (offset: 48M, size: 1M);
...
5.override write file (offset: 112M, size: 1M);
6.sync
7.echo 3 > /proc/sys/vm/drop_caches
8.read file (size:128M, unit: 4k, count: 32768)
(time dd if=/mnt/f2fs/128m bs=4k count=32768)
Extent Hit Ratio:
before patched
Hit Ratio 121 / 1071 1071 / 1071
Performance:
before patched
real 0m37.051s 0m35.556s
user 0m0.040s 0m0.026s
sys 0m2.990s 0m2.251s
Memory Cost:
before patched
Tree Count: 0 1 (size: 24 bytes)
Node Count: 0 45 (size: 1440 bytes)
v3:
o retest and given more details of test result.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-05 17:57:31 +08:00
|
|
|
/* try to shrink extent cache when there is no enough memory */
|
2022-12-01 01:36:43 +08:00
|
|
|
if (!f2fs_available_free_memory(sbi, READ_EXTENT_CACHE))
|
2022-12-01 01:26:29 +08:00
|
|
|
f2fs_shrink_read_extent_tree(sbi,
|
|
|
|
READ_EXTENT_CACHE_SHRINK_NUMBER);
|
f2fs: enable rb-tree extent cache
This patch enables rb-tree based extent cache in f2fs.
When we mount with "-o extent_cache", f2fs will try to add recently accessed
page-block mappings into rb-tree based extent cache as much as possible, instead
of original one extent info cache.
By this way, f2fs can support more effective cache between dnode page cache and
disk. It will supply high hit ratio in the cache with fewer memory when dnode
page cache are reclaimed in environment of low memory.
Storage: Sandisk sd card 64g
1.append write file (offset: 0, size: 128M);
2.override write file (offset: 2M, size: 1M);
3.override write file (offset: 4M, size: 1M);
...
4.override write file (offset: 48M, size: 1M);
...
5.override write file (offset: 112M, size: 1M);
6.sync
7.echo 3 > /proc/sys/vm/drop_caches
8.read file (size:128M, unit: 4k, count: 32768)
(time dd if=/mnt/f2fs/128m bs=4k count=32768)
Extent Hit Ratio:
before patched
Hit Ratio 121 / 1071 1071 / 1071
Performance:
before patched
real 0m37.051s 0m35.556s
user 0m0.040s 0m0.026s
sys 0m2.990s 0m2.251s
Memory Cost:
before patched
Tree Count: 0 1 (size: 24 bytes)
Node Count: 0 45 (size: 1440 bytes)
v3:
o retest and given more details of test result.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-05 17:57:31 +08:00
|
|
|
|
2022-12-02 09:37:15 +08:00
|
|
|
/* try to shrink age extent cache when there is no enough memory */
|
|
|
|
if (!f2fs_available_free_memory(sbi, AGE_EXTENT_CACHE))
|
|
|
|
f2fs_shrink_age_extent_tree(sbi,
|
|
|
|
AGE_EXTENT_CACHE_SHRINK_NUMBER);
|
f2fs: enable rb-tree extent cache
This patch enables rb-tree based extent cache in f2fs.
When we mount with "-o extent_cache", f2fs will try to add recently accessed
page-block mappings into rb-tree based extent cache as much as possible, instead
of original one extent info cache.
By this way, f2fs can support more effective cache between dnode page cache and
disk. It will supply high hit ratio in the cache with fewer memory when dnode
page cache are reclaimed in environment of low memory.
Storage: Sandisk sd card 64g
1.append write file (offset: 0, size: 128M);
2.override write file (offset: 2M, size: 1M);
3.override write file (offset: 4M, size: 1M);
...
4.override write file (offset: 48M, size: 1M);
...
5.override write file (offset: 112M, size: 1M);
6.sync
7.echo 3 > /proc/sys/vm/drop_caches
8.read file (size:128M, unit: 4k, count: 32768)
(time dd if=/mnt/f2fs/128m bs=4k count=32768)
Extent Hit Ratio:
before patched
Hit Ratio 121 / 1071 1071 / 1071
Performance:
before patched
real 0m37.051s 0m35.556s
user 0m0.040s 0m0.026s
sys 0m2.990s 0m2.251s
Memory Cost:
before patched
Tree Count: 0 1 (size: 24 bytes)
Node Count: 0 45 (size: 1440 bytes)
v3:
o retest and given more details of test result.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-05 17:57:31 +08:00
|
|
|
|
2015-06-20 06:36:07 +08:00
|
|
|
/* check the # of cached NAT entries */
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
if (!f2fs_available_free_memory(sbi, NAT_ENTRIES))
|
|
|
|
f2fs_try_to_free_nats(sbi, NAT_ENTRY_PER_BLOCK);
|
2015-06-20 06:36:07 +08:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
if (!f2fs_available_free_memory(sbi, FREE_NIDS))
|
|
|
|
f2fs_try_to_free_nids(sbi, MAX_FREE_NIDS);
|
2016-06-17 07:41:49 +08:00
|
|
|
else
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_build_free_nids(sbi, false, false);
|
2015-07-28 18:33:46 +08:00
|
|
|
|
2021-09-16 17:09:03 +08:00
|
|
|
if (excess_dirty_nats(sbi) || excess_dirty_threshold(sbi) ||
|
|
|
|
excess_prefree_segs(sbi) || !f2fs_space_for_roll_forward(sbi))
|
2020-11-25 10:57:36 +08:00
|
|
|
goto do_sync;
|
|
|
|
|
|
|
|
/* there is background inflight IO or foreground operation recently */
|
|
|
|
if (is_inflight_io(sbi, REQ_TIME) ||
|
2022-01-08 04:48:44 +08:00
|
|
|
(!f2fs_time_over(sbi, REQ_TIME) && f2fs_rwsem_is_locked(&sbi->cp_rwsem)))
|
2016-12-06 03:37:14 +08:00
|
|
|
return;
|
2015-07-28 18:33:46 +08:00
|
|
|
|
2020-11-25 10:57:36 +08:00
|
|
|
/* exceed periodical checkpoint timeout threshold */
|
|
|
|
if (f2fs_time_over(sbi, CP_TIME))
|
|
|
|
goto do_sync;
|
|
|
|
|
2015-06-20 06:36:07 +08:00
|
|
|
/* checkpoint is the only way to shrink partial cached entries */
|
2021-09-29 03:19:14 +08:00
|
|
|
if (f2fs_available_free_memory(sbi, NAT_ENTRIES) &&
|
2020-11-25 10:57:36 +08:00
|
|
|
f2fs_available_free_memory(sbi, INO_ENTRIES))
|
|
|
|
return;
|
|
|
|
|
|
|
|
do_sync:
|
|
|
|
if (test_opt(sbi, DATA_FLUSH) && from_bg) {
|
|
|
|
struct blk_plug plug;
|
|
|
|
|
|
|
|
mutex_lock(&sbi->flush_lock);
|
|
|
|
|
|
|
|
blk_start_plug(&plug);
|
2022-09-14 21:28:46 +08:00
|
|
|
f2fs_sync_dirty_inodes(sbi, FILE_INODE, false);
|
2020-11-25 10:57:36 +08:00
|
|
|
blk_finish_plug(&plug);
|
|
|
|
|
|
|
|
mutex_unlock(&sbi->flush_lock);
|
f2fs: support data flush in background
Previously, when finishing a checkpoint, we have persisted all fs meta
info including meta inode, node inode, dentry page of directory inode, so,
after a sudden power cut, f2fs can recover from last checkpoint with full
directory structure.
But during checkpoint, we didn't flush dirty pages of regular and symlink
inode, so such dirty datas still in memory will be lost in that moment of
power off.
In order to reduce the chance of lost data, this patch enables
f2fs_balance_fs_bg with the ability of data flushing. It will try to flush
user data before starting a checkpoint. So user's data written after last
checkpoint which may not be fsynced could be saved.
When we mount with data_flush option, after every period of cp_interval
(could be configured in sysfs: /sys/fs/f2fs/device/cp_interval) seconds
user data could be flushed into device once f2fs_balance_fs_bg was called
in kworker thread or gc thread.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-17 17:13:28 +08:00
|
|
|
}
|
2022-08-29 21:31:20 +08:00
|
|
|
f2fs_sync_fs(sbi->sb, 1);
|
2020-11-25 10:57:36 +08:00
|
|
|
stat_inc_bg_cp_count(sbi->stat_info);
|
2013-10-24 13:19:18 +08:00
|
|
|
}
|
|
|
|
|
2017-03-04 22:13:10 +08:00
|
|
|
static int __submit_flush_wait(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev)
|
2016-10-07 10:02:05 +08:00
|
|
|
{
|
2021-01-26 22:52:37 +08:00
|
|
|
int ret = blkdev_issue_flush(bdev);
|
2017-03-04 22:13:10 +08:00
|
|
|
|
|
|
|
trace_f2fs_issue_flush(bdev, test_opt(sbi, NOBARRIER),
|
|
|
|
test_opt(sbi, FLUSH_MERGE), ret);
|
2022-12-22 03:20:01 +08:00
|
|
|
if (!ret)
|
|
|
|
f2fs_update_iostat(sbi, NULL, FS_FLUSH_IO, 0);
|
2016-10-07 10:02:05 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
static int submit_flush_wait(struct f2fs_sb_info *sbi, nid_t ino)
|
2016-10-07 10:02:05 +08:00
|
|
|
{
|
2017-09-29 13:59:38 +08:00
|
|
|
int ret = 0;
|
2016-10-07 10:02:05 +08:00
|
|
|
int i;
|
|
|
|
|
2019-03-16 08:13:06 +08:00
|
|
|
if (!f2fs_is_multi_device(sbi))
|
2017-09-29 13:59:38 +08:00
|
|
|
return __submit_flush_wait(sbi, sbi->sb->s_bdev);
|
2017-03-04 22:13:10 +08:00
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
for (i = 0; i < sbi->s_ndevs; i++) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
if (!f2fs_is_dirty_device(sbi, ino, i, FLUSH_INO))
|
2017-09-29 13:59:38 +08:00
|
|
|
continue;
|
2017-03-04 22:13:10 +08:00
|
|
|
ret = __submit_flush_wait(sbi, FDEV(i).bdev);
|
|
|
|
if (ret)
|
|
|
|
break;
|
2016-10-07 10:02:05 +08:00
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-04-27 14:21:33 +08:00
|
|
|
static int issue_flush_thread(void *data)
|
2014-04-02 14:34:36 +08:00
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = data;
|
2017-01-10 06:13:03 +08:00
|
|
|
struct flush_cmd_control *fcc = SM_I(sbi)->fcc_info;
|
2014-04-27 14:21:21 +08:00
|
|
|
wait_queue_head_t *q = &fcc->flush_wait_queue;
|
2014-04-02 14:34:36 +08:00
|
|
|
repeat:
|
|
|
|
if (kthread_should_stop())
|
|
|
|
return 0;
|
|
|
|
|
2014-09-05 18:31:00 +08:00
|
|
|
if (!llist_empty(&fcc->issue_list)) {
|
2014-04-02 14:34:36 +08:00
|
|
|
struct flush_cmd *cmd, *next;
|
|
|
|
int ret;
|
|
|
|
|
2014-09-05 18:31:00 +08:00
|
|
|
fcc->dispatch_list = llist_del_all(&fcc->issue_list);
|
|
|
|
fcc->dispatch_list = llist_reverse_order(fcc->dispatch_list);
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
cmd = llist_entry(fcc->dispatch_list, struct flush_cmd, llnode);
|
|
|
|
|
|
|
|
ret = submit_flush_wait(sbi, cmd->ino);
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_inc(&fcc->issued_flush);
|
|
|
|
|
2014-09-05 18:31:00 +08:00
|
|
|
llist_for_each_entry_safe(cmd, next,
|
|
|
|
fcc->dispatch_list, llnode) {
|
2014-04-02 14:34:36 +08:00
|
|
|
cmd->ret = ret;
|
|
|
|
complete(&cmd->wait);
|
|
|
|
}
|
2014-04-27 14:21:21 +08:00
|
|
|
fcc->dispatch_list = NULL;
|
2014-04-02 14:34:36 +08:00
|
|
|
}
|
|
|
|
|
2014-04-27 14:21:21 +08:00
|
|
|
wait_event_interruptible(*q,
|
2014-09-05 18:31:00 +08:00
|
|
|
kthread_should_stop() || !llist_empty(&fcc->issue_list));
|
2014-04-02 14:34:36 +08:00
|
|
|
goto repeat;
|
|
|
|
}
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino)
|
2014-04-02 14:34:36 +08:00
|
|
|
{
|
2017-01-10 06:13:03 +08:00
|
|
|
struct flush_cmd_control *fcc = SM_I(sbi)->fcc_info;
|
2014-05-08 17:00:35 +08:00
|
|
|
struct flush_cmd cmd;
|
2017-03-25 17:19:58 +08:00
|
|
|
int ret;
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2014-07-24 00:57:31 +08:00
|
|
|
if (test_opt(sbi, NOBARRIER))
|
|
|
|
return 0;
|
|
|
|
|
2017-03-25 17:19:58 +08:00
|
|
|
if (!test_opt(sbi, FLUSH_MERGE)) {
|
2018-12-14 08:53:57 +08:00
|
|
|
atomic_inc(&fcc->queued_flush);
|
2017-09-29 13:59:38 +08:00
|
|
|
ret = submit_flush_wait(sbi, ino);
|
2018-12-14 08:53:57 +08:00
|
|
|
atomic_dec(&fcc->queued_flush);
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_inc(&fcc->issued_flush);
|
|
|
|
return ret;
|
|
|
|
}
|
2015-08-15 02:43:56 +08:00
|
|
|
|
2019-03-16 08:13:06 +08:00
|
|
|
if (atomic_inc_return(&fcc->queued_flush) == 1 ||
|
|
|
|
f2fs_is_multi_device(sbi)) {
|
2017-09-29 13:59:38 +08:00
|
|
|
ret = submit_flush_wait(sbi, ino);
|
2018-12-14 08:53:57 +08:00
|
|
|
atomic_dec(&fcc->queued_flush);
|
2017-03-25 17:19:58 +08:00
|
|
|
|
|
|
|
atomic_inc(&fcc->issued_flush);
|
2015-08-15 02:43:56 +08:00
|
|
|
return ret;
|
|
|
|
}
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
cmd.ino = ino;
|
2014-05-08 17:00:35 +08:00
|
|
|
init_completion(&cmd.wait);
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2014-09-05 18:31:00 +08:00
|
|
|
llist_add(&cmd.llnode, &fcc->issue_list);
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2021-02-20 17:38:43 +08:00
|
|
|
/*
|
|
|
|
* update issue_list before we wake up issue_flush thread, this
|
|
|
|
* smp_mb() pairs with another barrier in ___wait_event(), see
|
|
|
|
* more details in comments of waitqueue_active().
|
|
|
|
*/
|
2017-08-21 22:53:45 +08:00
|
|
|
smp_mb();
|
|
|
|
|
|
|
|
if (waitqueue_active(&fcc->flush_wait_queue))
|
2014-04-27 14:21:21 +08:00
|
|
|
wake_up(&fcc->flush_wait_queue);
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2016-12-08 08:23:32 +08:00
|
|
|
if (fcc->f2fs_issue_flush) {
|
|
|
|
wait_for_completion(&cmd.wait);
|
2018-12-14 08:53:57 +08:00
|
|
|
atomic_dec(&fcc->queued_flush);
|
2016-12-08 08:23:32 +08:00
|
|
|
} else {
|
2017-08-31 18:56:06 +08:00
|
|
|
struct llist_node *list;
|
|
|
|
|
|
|
|
list = llist_del_all(&fcc->issue_list);
|
|
|
|
if (!list) {
|
|
|
|
wait_for_completion(&cmd.wait);
|
2018-12-14 08:53:57 +08:00
|
|
|
atomic_dec(&fcc->queued_flush);
|
2017-08-31 18:56:06 +08:00
|
|
|
} else {
|
|
|
|
struct flush_cmd *tmp, *next;
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
ret = submit_flush_wait(sbi, ino);
|
2017-08-31 18:56:06 +08:00
|
|
|
|
|
|
|
llist_for_each_entry_safe(tmp, next, list, llnode) {
|
|
|
|
if (tmp == &cmd) {
|
|
|
|
cmd.ret = ret;
|
2018-12-14 08:53:57 +08:00
|
|
|
atomic_dec(&fcc->queued_flush);
|
2017-08-31 18:56:06 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
tmp->ret = ret;
|
|
|
|
complete(&tmp->wait);
|
|
|
|
}
|
|
|
|
}
|
2016-12-08 08:23:32 +08:00
|
|
|
}
|
2014-05-08 17:00:35 +08:00
|
|
|
|
|
|
|
return cmd.ret;
|
2014-04-02 14:34:36 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_create_flush_cmd_control(struct f2fs_sb_info *sbi)
|
2014-04-27 14:21:33 +08:00
|
|
|
{
|
|
|
|
dev_t dev = sbi->sb->s_bdev->bd_dev;
|
|
|
|
struct flush_cmd_control *fcc;
|
|
|
|
|
2017-01-10 06:13:03 +08:00
|
|
|
if (SM_I(sbi)->fcc_info) {
|
|
|
|
fcc = SM_I(sbi)->fcc_info;
|
2017-06-24 15:57:19 +08:00
|
|
|
if (fcc->f2fs_issue_flush)
|
2022-10-25 16:05:26 +08:00
|
|
|
return 0;
|
2016-12-08 08:23:32 +08:00
|
|
|
goto init_thread;
|
|
|
|
}
|
|
|
|
|
2017-11-30 19:28:17 +08:00
|
|
|
fcc = f2fs_kzalloc(sbi, sizeof(struct flush_cmd_control), GFP_KERNEL);
|
2014-04-27 14:21:33 +08:00
|
|
|
if (!fcc)
|
|
|
|
return -ENOMEM;
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_set(&fcc->issued_flush, 0);
|
2018-12-14 08:53:57 +08:00
|
|
|
atomic_set(&fcc->queued_flush, 0);
|
2014-04-27 14:21:33 +08:00
|
|
|
init_waitqueue_head(&fcc->flush_wait_queue);
|
2014-09-05 18:31:00 +08:00
|
|
|
init_llist_head(&fcc->issue_list);
|
2017-01-10 06:13:03 +08:00
|
|
|
SM_I(sbi)->fcc_info = fcc;
|
2017-06-01 16:43:51 +08:00
|
|
|
if (!test_opt(sbi, FLUSH_MERGE))
|
2022-10-25 16:05:26 +08:00
|
|
|
return 0;
|
2017-06-01 16:43:51 +08:00
|
|
|
|
2016-12-08 08:23:32 +08:00
|
|
|
init_thread:
|
2014-04-27 14:21:33 +08:00
|
|
|
fcc->f2fs_issue_flush = kthread_run(issue_flush_thread, sbi,
|
|
|
|
"f2fs_flush-%u:%u", MAJOR(dev), MINOR(dev));
|
|
|
|
if (IS_ERR(fcc->f2fs_issue_flush)) {
|
2022-10-27 18:24:46 +08:00
|
|
|
int err = PTR_ERR(fcc->f2fs_issue_flush);
|
|
|
|
|
2022-12-30 23:43:32 +08:00
|
|
|
fcc->f2fs_issue_flush = NULL;
|
2014-04-27 14:21:33 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2022-10-25 16:05:26 +08:00
|
|
|
return 0;
|
2014-04-27 14:21:33 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free)
|
2014-04-27 14:21:33 +08:00
|
|
|
{
|
2017-01-10 06:13:03 +08:00
|
|
|
struct flush_cmd_control *fcc = SM_I(sbi)->fcc_info;
|
2014-04-27 14:21:33 +08:00
|
|
|
|
2016-12-08 08:23:32 +08:00
|
|
|
if (fcc && fcc->f2fs_issue_flush) {
|
|
|
|
struct task_struct *flush_thread = fcc->f2fs_issue_flush;
|
|
|
|
|
|
|
|
fcc->f2fs_issue_flush = NULL;
|
|
|
|
kthread_stop(flush_thread);
|
|
|
|
}
|
|
|
|
if (free) {
|
2020-09-14 16:47:00 +08:00
|
|
|
kfree(fcc);
|
2017-01-10 06:13:03 +08:00
|
|
|
SM_I(sbi)->fcc_info = NULL;
|
2016-12-08 08:23:32 +08:00
|
|
|
}
|
2014-04-27 14:21:33 +08:00
|
|
|
}
|
|
|
|
|
2017-09-29 13:59:39 +08:00
|
|
|
int f2fs_flush_device_cache(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
int ret = 0, i;
|
|
|
|
|
2019-03-16 08:13:06 +08:00
|
|
|
if (!f2fs_is_multi_device(sbi))
|
2017-09-29 13:59:39 +08:00
|
|
|
return 0;
|
|
|
|
|
2020-10-12 10:28:14 +08:00
|
|
|
if (test_opt(sbi, NOBARRIER))
|
|
|
|
return 0;
|
|
|
|
|
2017-09-29 13:59:39 +08:00
|
|
|
for (i = 1; i < sbi->s_ndevs; i++) {
|
2021-08-04 08:38:38 +08:00
|
|
|
int count = DEFAULT_RETRY_IO_COUNT;
|
|
|
|
|
2017-09-29 13:59:39 +08:00
|
|
|
if (!f2fs_test_bit(i, (char *)&sbi->dirty_device))
|
|
|
|
continue;
|
2021-08-04 08:38:38 +08:00
|
|
|
|
|
|
|
do {
|
|
|
|
ret = __submit_flush_wait(sbi, FDEV(i).bdev);
|
|
|
|
if (ret)
|
2022-03-23 05:39:13 +08:00
|
|
|
f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
|
2021-08-04 08:38:38 +08:00
|
|
|
} while (ret && --count);
|
|
|
|
|
|
|
|
if (ret) {
|
2022-09-28 23:38:53 +08:00
|
|
|
f2fs_stop_checkpoint(sbi, false,
|
|
|
|
STOP_CP_REASON_FLUSH_FAIL);
|
2017-09-29 13:59:39 +08:00
|
|
|
break;
|
2021-08-04 08:38:38 +08:00
|
|
|
}
|
2017-09-29 13:59:39 +08:00
|
|
|
|
|
|
|
spin_lock(&sbi->dev_lock);
|
|
|
|
f2fs_clear_bit(i, (char *)&sbi->dirty_device);
|
|
|
|
spin_unlock(&sbi->dev_lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
static void __locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno,
|
|
|
|
enum dirty_type dirty_type)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
|
|
|
|
/* need not be added */
|
|
|
|
if (IS_CURSEG(sbi, segno))
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (!test_and_set_bit(segno, dirty_i->dirty_segmap[dirty_type]))
|
|
|
|
dirty_i->nr_dirty[dirty_type]++;
|
|
|
|
|
|
|
|
if (dirty_type == DIRTY) {
|
|
|
|
struct seg_entry *sentry = get_seg_entry(sbi, segno);
|
2013-10-25 16:31:57 +08:00
|
|
|
enum dirty_type t = sentry->type;
|
f2fs: fix the bitmap consistency of dirty segments
Like below, there are 8 segment bitmaps for SSR victim candidates.
enum dirty_type {
DIRTY_HOT_DATA, /* dirty segments assigned as hot data logs */
DIRTY_WARM_DATA, /* dirty segments assigned as warm data logs */
DIRTY_COLD_DATA, /* dirty segments assigned as cold data logs */
DIRTY_HOT_NODE, /* dirty segments assigned as hot node logs */
DIRTY_WARM_NODE, /* dirty segments assigned as warm node logs */
DIRTY_COLD_NODE, /* dirty segments assigned as cold node logs */
DIRTY, /* to count # of dirty segments */
PRE, /* to count # of entirely obsolete segments */
NR_DIRTY_TYPE
};
The upper 6 bitmaps indicates segments dirtied by active log areas respectively.
And, the DIRTY bitmap integrates all the 6 bitmaps.
For example,
o DIRTY_HOT_DATA : 1010000
o DIRTY_WARM_DATA: 0100000
o DIRTY_COLD_DATA: 0001000
o DIRTY_HOT_NODE : 0000010
o DIRTY_WARM_NODE: 0000001
o DIRTY_COLD_NODE: 0000000
In this case,
o DIRTY : 1111011,
which means that we should guarantee the consistency between DIRTY and other
bitmaps concreately.
However, the SSR mode selects victims freely from any log types, which can set
multiple bits across the various bitmap types.
So, this patch eliminates this inconsistency.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-01 12:52:09 +08:00
|
|
|
|
2014-09-03 07:24:11 +08:00
|
|
|
if (unlikely(t >= DIRTY)) {
|
|
|
|
f2fs_bug_on(sbi, 1);
|
|
|
|
return;
|
|
|
|
}
|
2013-10-25 16:31:57 +08:00
|
|
|
if (!test_and_set_bit(segno, dirty_i->dirty_segmap[t]))
|
|
|
|
dirty_i->nr_dirty[t]++;
|
2020-06-18 12:37:10 +08:00
|
|
|
|
|
|
|
if (__is_large_section(sbi)) {
|
|
|
|
unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
|
2020-08-19 09:34:48 +08:00
|
|
|
block_t valid_blocks =
|
2020-06-18 12:37:10 +08:00
|
|
|
get_valid_blocks(sbi, segno, true);
|
|
|
|
|
|
|
|
f2fs_bug_on(sbi, unlikely(!valid_blocks ||
|
2022-06-29 02:03:57 +08:00
|
|
|
valid_blocks == CAP_BLKS_PER_SEC(sbi)));
|
2020-06-18 12:37:10 +08:00
|
|
|
|
|
|
|
if (!IS_CURSEC(sbi, secno))
|
|
|
|
set_bit(secno, dirty_i->dirty_secmap);
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __remove_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno,
|
|
|
|
enum dirty_type dirty_type)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2020-08-19 09:34:48 +08:00
|
|
|
block_t valid_blocks;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
if (test_and_clear_bit(segno, dirty_i->dirty_segmap[dirty_type]))
|
|
|
|
dirty_i->nr_dirty[dirty_type]--;
|
|
|
|
|
|
|
|
if (dirty_type == DIRTY) {
|
2013-10-25 16:31:57 +08:00
|
|
|
struct seg_entry *sentry = get_seg_entry(sbi, segno);
|
|
|
|
enum dirty_type t = sentry->type;
|
|
|
|
|
|
|
|
if (test_and_clear_bit(segno, dirty_i->dirty_segmap[t]))
|
|
|
|
dirty_i->nr_dirty[t]--;
|
f2fs: fix the bitmap consistency of dirty segments
Like below, there are 8 segment bitmaps for SSR victim candidates.
enum dirty_type {
DIRTY_HOT_DATA, /* dirty segments assigned as hot data logs */
DIRTY_WARM_DATA, /* dirty segments assigned as warm data logs */
DIRTY_COLD_DATA, /* dirty segments assigned as cold data logs */
DIRTY_HOT_NODE, /* dirty segments assigned as hot node logs */
DIRTY_WARM_NODE, /* dirty segments assigned as warm node logs */
DIRTY_COLD_NODE, /* dirty segments assigned as cold node logs */
DIRTY, /* to count # of dirty segments */
PRE, /* to count # of entirely obsolete segments */
NR_DIRTY_TYPE
};
The upper 6 bitmaps indicates segments dirtied by active log areas respectively.
And, the DIRTY bitmap integrates all the 6 bitmaps.
For example,
o DIRTY_HOT_DATA : 1010000
o DIRTY_WARM_DATA: 0100000
o DIRTY_COLD_DATA: 0001000
o DIRTY_HOT_NODE : 0000010
o DIRTY_WARM_NODE: 0000001
o DIRTY_COLD_NODE: 0000000
In this case,
o DIRTY : 1111011,
which means that we should guarantee the consistency between DIRTY and other
bitmaps concreately.
However, the SSR mode selects victims freely from any log types, which can set
multiple bits across the various bitmap types.
So, this patch eliminates this inconsistency.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-01 12:52:09 +08:00
|
|
|
|
2020-06-18 12:37:10 +08:00
|
|
|
valid_blocks = get_valid_blocks(sbi, segno, true);
|
|
|
|
if (valid_blocks == 0) {
|
2017-04-08 06:08:17 +08:00
|
|
|
clear_bit(GET_SEC_FROM_SEG(sbi, segno),
|
2013-03-31 12:26:03 +08:00
|
|
|
dirty_i->victim_secmap);
|
2019-08-07 21:40:32 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
clear_bit(segno, SIT_I(sbi)->invalid_segmap);
|
|
|
|
#endif
|
|
|
|
}
|
2020-06-18 12:37:10 +08:00
|
|
|
if (__is_large_section(sbi)) {
|
|
|
|
unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
|
|
|
|
|
|
|
|
if (!valid_blocks ||
|
2022-06-29 02:03:57 +08:00
|
|
|
valid_blocks == CAP_BLKS_PER_SEC(sbi)) {
|
2020-06-18 12:37:10 +08:00
|
|
|
clear_bit(secno, dirty_i->dirty_secmap);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!IS_CURSEC(sbi, secno))
|
|
|
|
set_bit(secno, dirty_i->dirty_secmap);
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Should not occur error such as -ENOMEM.
|
|
|
|
* Adding dirty entry into seglist is not critical operation.
|
|
|
|
* If a given segment is one of current working segments, it won't be added.
|
|
|
|
*/
|
2013-06-13 16:59:28 +08:00
|
|
|
static void locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2018-08-21 10:21:43 +08:00
|
|
|
unsigned short valid_blocks, ckpt_valid_blocks;
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
unsigned int usable_blocks;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
if (segno == NULL_SEGNO || IS_CURSEG(sbi, segno))
|
|
|
|
return;
|
|
|
|
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
usable_blocks = f2fs_usable_blks_in_seg(sbi, segno);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
|
|
|
|
2017-04-08 05:33:22 +08:00
|
|
|
valid_blocks = get_valid_blocks(sbi, segno, false);
|
f2fs: fix to avoid touching checkpointed data in get_victim()
In CP disabling mode, there are two issues when using LFS or SSR | AT_SSR
mode to select victim:
1. LFS is set to find source section during GC, the victim should have
no checkpointed data, since after GC, section could not be set free for
reuse.
Previously, we only check valid chpt blocks in current segment rather
than section, fix it.
2. SSR | AT_SSR are set to find target segment for writes which can be
fully filled by checkpointed and newly written blocks, we should never
select such segment, otherwise it can cause panic or data corruption
during allocation, potential case is described as below:
a) target segment has 'n' (n < 512) ckpt valid blocks
b) GC migrates 'n' valid blocks to other segment (segment is still
in dirty list)
c) GC migrates '512 - n' blocks to target segment (segment has 'n'
cp_vblocks and '512 - n' vblocks)
d) If GC selects target segment via {AT,}SSR allocator, however there
is no free space in targe segment.
Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-03-24 11:18:28 +08:00
|
|
|
ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno, false);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2018-08-21 10:21:43 +08:00
|
|
|
if (valid_blocks == 0 && (!is_sbi_flag_set(sbi, SBI_CP_DISABLED) ||
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
ckpt_valid_blocks == usable_blocks)) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
__locate_dirty_segment(sbi, segno, PRE);
|
|
|
|
__remove_dirty_segment(sbi, segno, DIRTY);
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
} else if (valid_blocks < usable_blocks) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
__locate_dirty_segment(sbi, segno, DIRTY);
|
|
|
|
} else {
|
|
|
|
/* Recovery routine with SSR needs this */
|
|
|
|
__remove_dirty_segment(sbi, segno, DIRTY);
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
}
|
|
|
|
|
2018-08-21 10:21:43 +08:00
|
|
|
/* This moves currently empty dirty blocks to prefree. Must hold seglist_lock */
|
|
|
|
void f2fs_dirty_to_prefree(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
unsigned int segno;
|
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
|
|
|
for_each_set_bit(segno, dirty_i->dirty_segmap[DIRTY], MAIN_SEGS(sbi)) {
|
|
|
|
if (get_valid_blocks(sbi, segno, false))
|
|
|
|
continue;
|
|
|
|
if (IS_CURSEG(sbi, segno))
|
|
|
|
continue;
|
|
|
|
__locate_dirty_segment(sbi, segno, PRE);
|
|
|
|
__remove_dirty_segment(sbi, segno, DIRTY);
|
|
|
|
}
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
}
|
|
|
|
|
2019-05-30 08:49:06 +08:00
|
|
|
block_t f2fs_get_unusable_blocks(struct f2fs_sb_info *sbi)
|
2018-08-21 10:21:43 +08:00
|
|
|
{
|
f2fs: Lower threshold for disable_cp_again
The existing threshold for allowable holes at checkpoint=disable time is
too high. The OVP space contains reserved segments, which are always in
the form of free segments. These must be subtracted from the OVP value.
The current threshold is meant to be the maximum value of holes of a
single type we can have and still guarantee that we can fill the disk
without failing to find space for a block of a given type.
If the disk is full, ignoring current reserved, which only helps us,
the amount of unused blocks is equal to the OVP area. Of that, there
are reserved segments, which must be free segments, and the rest of the
ovp area, which can come from either free segments or holes. The maximum
possible amount of holes is OVP-reserved.
Now, consider the disk when mounting with checkpoint=disable.
We must be able to fill all available free space with either data or
node blocks. When we start with checkpoint=disable, holes are locked to
their current type. Say we have H of one type of hole, and H+X of the
other. We can fill H of that space with arbitrary typed blocks via SSR.
For the remaining H+X blocks, we may not have any of a given block type
left at all. For instance, if we were to fill the disk entirely with
blocks of the type with fewer holes, the H+X blocks of the opposite type
would not be used. If H+X > OVP-reserved, there would be more holes than
could possibly exist, and we would have failed to find a suitable block
earlier on, leading to a crash in update_sit_entry.
If H+X <= OVP-reserved, then the holes end up effectively masked by the OVP
region in this case.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-30 08:49:03 +08:00
|
|
|
int ovp_hole_segs =
|
|
|
|
(overprovision_segments(sbi) - reserved_segments(sbi));
|
|
|
|
block_t ovp_holes = ovp_hole_segs << sbi->log_blocks_per_seg;
|
2019-05-30 08:49:06 +08:00
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2018-08-21 10:21:43 +08:00
|
|
|
block_t holes[2] = {0, 0}; /* DATA and NODE */
|
2019-05-30 08:49:06 +08:00
|
|
|
block_t unusable;
|
2018-08-21 10:21:43 +08:00
|
|
|
struct seg_entry *se;
|
|
|
|
unsigned int segno;
|
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
|
|
|
for_each_set_bit(segno, dirty_i->dirty_segmap[DIRTY], MAIN_SEGS(sbi)) {
|
|
|
|
se = get_seg_entry(sbi, segno);
|
|
|
|
if (IS_NODESEG(se->type))
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
holes[NODE] += f2fs_usable_blks_in_seg(sbi, segno) -
|
|
|
|
se->valid_blocks;
|
2018-08-21 10:21:43 +08:00
|
|
|
else
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
holes[DATA] += f2fs_usable_blks_in_seg(sbi, segno) -
|
|
|
|
se->valid_blocks;
|
2018-08-21 10:21:43 +08:00
|
|
|
}
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
|
2022-10-29 22:49:30 +08:00
|
|
|
unusable = max(holes[DATA], holes[NODE]);
|
2019-05-30 08:49:06 +08:00
|
|
|
if (unusable > ovp_holes)
|
|
|
|
return unusable - ovp_holes;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable)
|
|
|
|
{
|
|
|
|
int ovp_hole_segs =
|
|
|
|
(overprovision_segments(sbi) - reserved_segments(sbi));
|
|
|
|
if (unusable > F2FS_OPTION(sbi).unusable_cap)
|
2018-08-21 10:21:43 +08:00
|
|
|
return -EAGAIN;
|
2019-01-25 09:48:38 +08:00
|
|
|
if (is_sbi_flag_set(sbi, SBI_CP_DISABLED_QUICK) &&
|
f2fs: Lower threshold for disable_cp_again
The existing threshold for allowable holes at checkpoint=disable time is
too high. The OVP space contains reserved segments, which are always in
the form of free segments. These must be subtracted from the OVP value.
The current threshold is meant to be the maximum value of holes of a
single type we can have and still guarantee that we can fill the disk
without failing to find space for a block of a given type.
If the disk is full, ignoring current reserved, which only helps us,
the amount of unused blocks is equal to the OVP area. Of that, there
are reserved segments, which must be free segments, and the rest of the
ovp area, which can come from either free segments or holes. The maximum
possible amount of holes is OVP-reserved.
Now, consider the disk when mounting with checkpoint=disable.
We must be able to fill all available free space with either data or
node blocks. When we start with checkpoint=disable, holes are locked to
their current type. Say we have H of one type of hole, and H+X of the
other. We can fill H of that space with arbitrary typed blocks via SSR.
For the remaining H+X blocks, we may not have any of a given block type
left at all. For instance, if we were to fill the disk entirely with
blocks of the type with fewer holes, the H+X blocks of the opposite type
would not be used. If H+X > OVP-reserved, there would be more holes than
could possibly exist, and we would have failed to find a suitable block
earlier on, leading to a crash in update_sit_entry.
If H+X <= OVP-reserved, then the holes end up effectively masked by the OVP
region in this case.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-30 08:49:03 +08:00
|
|
|
dirty_segments(sbi) > ovp_hole_segs)
|
2019-01-25 09:48:38 +08:00
|
|
|
return -EAGAIN;
|
2018-08-21 10:21:43 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* This is only used by SBI_CP_DISABLED */
|
|
|
|
static unsigned int get_free_segment(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
unsigned int segno = 0;
|
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
|
|
|
for_each_set_bit(segno, dirty_i->dirty_segmap[DIRTY], MAIN_SEGS(sbi)) {
|
|
|
|
if (get_valid_blocks(sbi, segno, false))
|
|
|
|
continue;
|
f2fs: fix to avoid touching checkpointed data in get_victim()
In CP disabling mode, there are two issues when using LFS or SSR | AT_SSR
mode to select victim:
1. LFS is set to find source section during GC, the victim should have
no checkpointed data, since after GC, section could not be set free for
reuse.
Previously, we only check valid chpt blocks in current segment rather
than section, fix it.
2. SSR | AT_SSR are set to find target segment for writes which can be
fully filled by checkpointed and newly written blocks, we should never
select such segment, otherwise it can cause panic or data corruption
during allocation, potential case is described as below:
a) target segment has 'n' (n < 512) ckpt valid blocks
b) GC migrates 'n' valid blocks to other segment (segment is still
in dirty list)
c) GC migrates '512 - n' blocks to target segment (segment has 'n'
cp_vblocks and '512 - n' vblocks)
d) If GC selects target segment via {AT,}SSR allocator, however there
is no free space in targe segment.
Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-03-24 11:18:28 +08:00
|
|
|
if (get_ckpt_valid_blocks(sbi, segno, false))
|
2018-08-21 10:21:43 +08:00
|
|
|
continue;
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
return segno;
|
|
|
|
}
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
return NULL_SEGNO;
|
|
|
|
}
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
static struct discard_cmd *__create_discard_cmd(struct f2fs_sb_info *sbi,
|
2017-03-08 10:02:02 +08:00
|
|
|
struct block_device *bdev, block_t lstart,
|
|
|
|
block_t start, block_t len)
|
2016-08-29 23:58:34 +08:00
|
|
|
{
|
2017-01-12 06:40:24 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-04-15 14:09:37 +08:00
|
|
|
struct list_head *pend_list;
|
2017-01-10 06:13:03 +08:00
|
|
|
struct discard_cmd *dc;
|
2016-08-29 23:58:34 +08:00
|
|
|
|
2017-04-15 14:09:37 +08:00
|
|
|
f2fs_bug_on(sbi, !len);
|
|
|
|
|
|
|
|
pend_list = &dcc->pend_list[plist_idx(len)];
|
|
|
|
|
2021-08-09 08:24:48 +08:00
|
|
|
dc = f2fs_kmem_cache_alloc(discard_cmd_slab, GFP_NOFS, true, NULL);
|
2017-01-10 06:13:03 +08:00
|
|
|
INIT_LIST_HEAD(&dc->list);
|
2017-03-08 10:02:02 +08:00
|
|
|
dc->bdev = bdev;
|
2023-03-11 03:12:35 +08:00
|
|
|
dc->di.lstart = lstart;
|
|
|
|
dc->di.start = start;
|
|
|
|
dc->di.len = len;
|
2017-04-26 17:39:54 +08:00
|
|
|
dc->ref = 0;
|
2017-01-10 12:32:07 +08:00
|
|
|
dc->state = D_PREP;
|
2018-12-14 08:53:57 +08:00
|
|
|
dc->queued = 0;
|
2017-03-08 10:02:02 +08:00
|
|
|
dc->error = 0;
|
2017-01-10 06:13:03 +08:00
|
|
|
init_completion(&dc->wait);
|
2017-04-05 18:19:48 +08:00
|
|
|
list_add_tail(&dc->list, pend_list);
|
2018-08-06 22:43:50 +08:00
|
|
|
spin_lock_init(&dc->lock);
|
|
|
|
dc->bio_ref = 0;
|
2017-03-25 17:19:59 +08:00
|
|
|
atomic_inc(&dcc->discard_cmd_cnt);
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks += len;
|
2017-04-14 23:24:55 +08:00
|
|
|
|
|
|
|
return dc;
|
|
|
|
}
|
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
static bool f2fs_check_discard_tree(struct f2fs_sb_info *sbi)
|
2017-04-14 23:24:55 +08:00
|
|
|
{
|
2023-03-11 03:12:35 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
2017-04-14 23:24:55 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2023-03-11 03:12:35 +08:00
|
|
|
struct rb_node *cur = rb_first_cached(&dcc->root), *next;
|
|
|
|
struct discard_cmd *cur_dc, *next_dc;
|
|
|
|
|
|
|
|
while (cur) {
|
|
|
|
next = rb_next(cur);
|
|
|
|
if (!next)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
cur_dc = rb_entry(cur, struct discard_cmd, rb_node);
|
|
|
|
next_dc = rb_entry(next, struct discard_cmd, rb_node);
|
|
|
|
|
|
|
|
if (cur_dc->di.lstart + cur_dc->di.len > next_dc->di.lstart) {
|
|
|
|
f2fs_info(sbi, "broken discard_rbtree, "
|
|
|
|
"cur(%u, %u) next(%u, %u)",
|
|
|
|
cur_dc->di.lstart, cur_dc->di.len,
|
|
|
|
next_dc->di.lstart, next_dc->di.len);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
cur = next;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct discard_cmd *__lookup_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
block_t blkaddr)
|
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct rb_node *node = dcc->root.rb_root.rb_node;
|
2017-04-14 23:24:55 +08:00
|
|
|
struct discard_cmd *dc;
|
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
while (node) {
|
|
|
|
dc = rb_entry(node, struct discard_cmd, rb_node);
|
2017-04-14 23:24:55 +08:00
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
if (blkaddr < dc->di.lstart)
|
|
|
|
node = node->rb_left;
|
|
|
|
else if (blkaddr >= dc->di.lstart + dc->di.len)
|
|
|
|
node = node->rb_right;
|
|
|
|
else
|
|
|
|
return dc;
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct discard_cmd *__lookup_discard_cmd_ret(struct rb_root_cached *root,
|
|
|
|
block_t blkaddr,
|
|
|
|
struct discard_cmd **prev_entry,
|
|
|
|
struct discard_cmd **next_entry,
|
|
|
|
struct rb_node ***insert_p,
|
|
|
|
struct rb_node **insert_parent)
|
|
|
|
{
|
|
|
|
struct rb_node **pnode = &root->rb_root.rb_node;
|
|
|
|
struct rb_node *parent = NULL, *tmp_node;
|
|
|
|
struct discard_cmd *dc;
|
|
|
|
|
|
|
|
*insert_p = NULL;
|
|
|
|
*insert_parent = NULL;
|
|
|
|
*prev_entry = NULL;
|
|
|
|
*next_entry = NULL;
|
|
|
|
|
|
|
|
if (RB_EMPTY_ROOT(&root->rb_root))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
while (*pnode) {
|
|
|
|
parent = *pnode;
|
|
|
|
dc = rb_entry(*pnode, struct discard_cmd, rb_node);
|
|
|
|
|
|
|
|
if (blkaddr < dc->di.lstart)
|
|
|
|
pnode = &(*pnode)->rb_left;
|
|
|
|
else if (blkaddr >= dc->di.lstart + dc->di.len)
|
|
|
|
pnode = &(*pnode)->rb_right;
|
|
|
|
else
|
|
|
|
goto lookup_neighbors;
|
|
|
|
}
|
|
|
|
|
|
|
|
*insert_p = pnode;
|
|
|
|
*insert_parent = parent;
|
|
|
|
|
|
|
|
dc = rb_entry(parent, struct discard_cmd, rb_node);
|
|
|
|
tmp_node = parent;
|
|
|
|
if (parent && blkaddr > dc->di.lstart)
|
|
|
|
tmp_node = rb_next(parent);
|
|
|
|
*next_entry = rb_entry_safe(tmp_node, struct discard_cmd, rb_node);
|
2017-04-14 23:24:55 +08:00
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
tmp_node = parent;
|
|
|
|
if (parent && blkaddr < dc->di.lstart)
|
|
|
|
tmp_node = rb_prev(parent);
|
|
|
|
*prev_entry = rb_entry_safe(tmp_node, struct discard_cmd, rb_node);
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
lookup_neighbors:
|
|
|
|
/* lookup prev node for merging backward later */
|
|
|
|
tmp_node = rb_prev(&dc->rb_node);
|
|
|
|
*prev_entry = rb_entry_safe(tmp_node, struct discard_cmd, rb_node);
|
|
|
|
|
|
|
|
/* lookup next node for merging frontward later */
|
|
|
|
tmp_node = rb_next(&dc->rb_node);
|
|
|
|
*next_entry = rb_entry_safe(tmp_node, struct discard_cmd, rb_node);
|
2017-04-14 23:24:55 +08:00
|
|
|
return dc;
|
2017-01-10 12:32:07 +08:00
|
|
|
}
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
static void __detach_discard_cmd(struct discard_cmd_control *dcc,
|
|
|
|
struct discard_cmd *dc)
|
2017-01-10 12:32:07 +08:00
|
|
|
{
|
2017-01-12 02:20:04 +08:00
|
|
|
if (dc->state == D_DONE)
|
2018-12-14 08:53:57 +08:00
|
|
|
atomic_sub(dc->queued, &dcc->queued_discard);
|
2017-04-14 23:24:55 +08:00
|
|
|
|
|
|
|
list_del(&dc->list);
|
2018-10-04 11:18:30 +08:00
|
|
|
rb_erase_cached(&dc->rb_node, &dcc->root);
|
2023-03-11 03:12:35 +08:00
|
|
|
dcc->undiscard_blks -= dc->di.len;
|
2017-04-14 23:24:55 +08:00
|
|
|
|
|
|
|
kmem_cache_free(discard_cmd_slab, dc);
|
|
|
|
|
|
|
|
atomic_dec(&dcc->discard_cmd_cnt);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __remove_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_cmd *dc)
|
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2018-08-06 22:43:50 +08:00
|
|
|
unsigned long flags;
|
2017-01-12 02:20:04 +08:00
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
trace_f2fs_remove_discard(dc->bdev, dc->di.start, dc->di.len);
|
2017-10-04 09:08:36 +08:00
|
|
|
|
2018-08-06 22:43:50 +08:00
|
|
|
spin_lock_irqsave(&dc->lock, flags);
|
|
|
|
if (dc->bio_ref) {
|
|
|
|
spin_unlock_irqrestore(&dc->lock, flags);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
spin_unlock_irqrestore(&dc->lock, flags);
|
|
|
|
|
2017-06-05 18:29:07 +08:00
|
|
|
f2fs_bug_on(sbi, dc->ref);
|
|
|
|
|
2017-03-08 10:02:02 +08:00
|
|
|
if (dc->error == -EOPNOTSUPP)
|
|
|
|
dc->error = 0;
|
2017-01-10 12:32:07 +08:00
|
|
|
|
2017-03-08 10:02:02 +08:00
|
|
|
if (dc->error)
|
2018-08-22 17:17:47 +08:00
|
|
|
printk_ratelimited(
|
2019-11-01 17:53:23 +08:00
|
|
|
"%sF2FS-fs (%s): Issue discard(%u, %u, %u) failed, ret: %d",
|
|
|
|
KERN_INFO, sbi->sb->s_id,
|
2023-03-11 03:12:35 +08:00
|
|
|
dc->di.lstart, dc->di.start, dc->di.len, dc->error);
|
2017-04-14 23:24:55 +08:00
|
|
|
__detach_discard_cmd(dcc, dc);
|
2016-08-29 23:58:34 +08:00
|
|
|
}
|
|
|
|
|
2017-03-08 10:02:02 +08:00
|
|
|
static void f2fs_submit_discard_endio(struct bio *bio)
|
|
|
|
{
|
|
|
|
struct discard_cmd *dc = (struct discard_cmd *)bio->bi_private;
|
2018-08-06 22:43:50 +08:00
|
|
|
unsigned long flags;
|
2017-03-08 10:02:02 +08:00
|
|
|
|
2018-08-06 22:43:50 +08:00
|
|
|
spin_lock_irqsave(&dc->lock, flags);
|
2020-04-15 12:05:54 +08:00
|
|
|
if (!dc->error)
|
|
|
|
dc->error = blk_status_to_errno(bio->bi_status);
|
2018-08-06 22:43:50 +08:00
|
|
|
dc->bio_ref--;
|
|
|
|
if (!dc->bio_ref && dc->state == D_SUBMIT) {
|
|
|
|
dc->state = D_DONE;
|
|
|
|
complete_all(&dc->wait);
|
|
|
|
}
|
|
|
|
spin_unlock_irqrestore(&dc->lock, flags);
|
2017-03-08 10:02:02 +08:00
|
|
|
bio_put(bio);
|
|
|
|
}
|
|
|
|
|
2018-01-05 17:41:20 +08:00
|
|
|
static void __check_sit_bitmap(struct f2fs_sb_info *sbi,
|
2017-06-30 17:19:02 +08:00
|
|
|
block_t start, block_t end)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
struct seg_entry *sentry;
|
|
|
|
unsigned int segno;
|
|
|
|
block_t blk = start;
|
|
|
|
unsigned long offset, size, max_blocks = sbi->blocks_per_seg;
|
|
|
|
unsigned long *map;
|
|
|
|
|
|
|
|
while (blk < end) {
|
|
|
|
segno = GET_SEGNO(sbi, blk);
|
|
|
|
sentry = get_seg_entry(sbi, segno);
|
|
|
|
offset = GET_BLKOFF_FROM_SEG0(sbi, blk);
|
|
|
|
|
2017-08-04 17:07:15 +08:00
|
|
|
if (end < START_BLOCK(sbi, segno + 1))
|
|
|
|
size = GET_BLKOFF_FROM_SEG0(sbi, end);
|
|
|
|
else
|
|
|
|
size = max_blocks;
|
2017-06-30 17:19:02 +08:00
|
|
|
map = (unsigned long *)(sentry->cur_valid_map);
|
|
|
|
offset = __find_rev_next_bit(map, size, offset);
|
|
|
|
f2fs_bug_on(sbi, offset != size);
|
2017-08-04 17:07:15 +08:00
|
|
|
blk = START_BLOCK(sbi, segno + 1);
|
2017-06-30 17:19:02 +08:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2018-05-30 00:58:42 +08:00
|
|
|
static void __init_discard_policy(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_policy *dpolicy,
|
|
|
|
int discard_type, unsigned int granularity)
|
|
|
|
{
|
2021-04-06 17:09:16 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
|
2018-05-30 00:58:42 +08:00
|
|
|
/* common policy */
|
|
|
|
dpolicy->type = discard_type;
|
|
|
|
dpolicy->sync = true;
|
2018-07-08 22:11:01 +08:00
|
|
|
dpolicy->ordered = false;
|
2018-05-30 00:58:42 +08:00
|
|
|
dpolicy->granularity = granularity;
|
|
|
|
|
2021-12-14 09:12:03 +08:00
|
|
|
dpolicy->max_requests = dcc->max_discard_request;
|
2023-01-04 19:40:29 +08:00
|
|
|
dpolicy->io_aware_gran = dcc->discard_io_aware_gran;
|
2020-03-26 17:43:56 +08:00
|
|
|
dpolicy->timeout = false;
|
2018-05-30 00:58:42 +08:00
|
|
|
|
|
|
|
if (discard_type == DPOLICY_BG) {
|
2021-12-14 09:12:03 +08:00
|
|
|
dpolicy->min_interval = dcc->min_discard_issue_time;
|
|
|
|
dpolicy->mid_interval = dcc->mid_discard_issue_time;
|
|
|
|
dpolicy->max_interval = dcc->max_discard_issue_time;
|
2018-05-30 00:58:42 +08:00
|
|
|
dpolicy->io_aware = true;
|
2018-04-10 15:43:09 +08:00
|
|
|
dpolicy->sync = false;
|
2018-07-08 22:11:01 +08:00
|
|
|
dpolicy->ordered = true;
|
2022-11-24 00:44:02 +08:00
|
|
|
if (utilization(sbi) > dcc->discard_urgent_util) {
|
2022-11-24 00:44:01 +08:00
|
|
|
dpolicy->granularity = MIN_DISCARD_GRANULARITY;
|
2021-04-06 17:09:16 +08:00
|
|
|
if (atomic_read(&dcc->discard_cmd_cnt))
|
|
|
|
dpolicy->max_interval =
|
2021-12-14 09:12:03 +08:00
|
|
|
dcc->min_discard_issue_time;
|
2018-05-30 00:58:42 +08:00
|
|
|
}
|
|
|
|
} else if (discard_type == DPOLICY_FORCE) {
|
2021-12-14 09:12:03 +08:00
|
|
|
dpolicy->min_interval = dcc->min_discard_issue_time;
|
|
|
|
dpolicy->mid_interval = dcc->mid_discard_issue_time;
|
|
|
|
dpolicy->max_interval = dcc->max_discard_issue_time;
|
2018-05-30 00:58:42 +08:00
|
|
|
dpolicy->io_aware = false;
|
|
|
|
} else if (discard_type == DPOLICY_FSTRIM) {
|
|
|
|
dpolicy->io_aware = false;
|
|
|
|
} else if (discard_type == DPOLICY_UMOUNT) {
|
|
|
|
dpolicy->io_aware = false;
|
2019-01-26 01:12:13 +08:00
|
|
|
/* we need to issue all to keep CP_TRIMMED_FLAG */
|
2022-11-24 00:44:01 +08:00
|
|
|
dpolicy->granularity = MIN_DISCARD_GRANULARITY;
|
2020-03-26 17:43:56 +08:00
|
|
|
dpolicy->timeout = true;
|
2018-05-30 00:58:42 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-08-06 22:43:50 +08:00
|
|
|
static void __update_discard_tree_range(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t lstart,
|
|
|
|
block_t start, block_t len);
|
2023-05-08 16:10:42 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_BLK_DEV_ZONED
|
|
|
|
static void __submit_zone_reset_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_cmd *dc, blk_opf_t flag,
|
|
|
|
struct list_head *wait_list,
|
|
|
|
unsigned int *issued)
|
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct block_device *bdev = dc->bdev;
|
|
|
|
struct bio *bio = bio_alloc(bdev, 0, REQ_OP_ZONE_RESET | flag, GFP_NOFS);
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
trace_f2fs_issue_reset_zone(bdev, dc->di.start);
|
|
|
|
|
|
|
|
spin_lock_irqsave(&dc->lock, flags);
|
|
|
|
dc->state = D_SUBMIT;
|
|
|
|
dc->bio_ref++;
|
|
|
|
spin_unlock_irqrestore(&dc->lock, flags);
|
|
|
|
|
|
|
|
if (issued)
|
|
|
|
(*issued)++;
|
|
|
|
|
|
|
|
atomic_inc(&dcc->queued_discard);
|
|
|
|
dc->queued++;
|
|
|
|
list_move_tail(&dc->list, wait_list);
|
|
|
|
|
|
|
|
/* sanity check on discard range */
|
|
|
|
__check_sit_bitmap(sbi, dc->di.lstart, dc->di.lstart + dc->di.len);
|
|
|
|
|
|
|
|
bio->bi_iter.bi_sector = SECTOR_FROM_BLOCK(dc->di.start);
|
|
|
|
bio->bi_private = dc;
|
|
|
|
bio->bi_end_io = f2fs_submit_discard_endio;
|
|
|
|
submit_bio(bio);
|
|
|
|
|
|
|
|
atomic_inc(&dcc->issued_discard);
|
|
|
|
f2fs_update_iostat(sbi, NULL, FS_ZONE_RESET_IO, dc->di.len * F2FS_BLKSIZE);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2017-03-08 10:02:02 +08:00
|
|
|
/* this function is copied from blkdev_issue_discard from block/blk-lib.c */
|
2018-08-08 10:14:55 +08:00
|
|
|
static int __submit_discard_cmd(struct f2fs_sb_info *sbi,
|
2022-12-13 17:34:19 +08:00
|
|
|
struct discard_policy *dpolicy,
|
|
|
|
struct discard_cmd *dc, int *issued)
|
2017-03-08 10:02:02 +08:00
|
|
|
{
|
2018-08-06 22:43:50 +08:00
|
|
|
struct block_device *bdev = dc->bdev;
|
|
|
|
unsigned int max_discard_blocks =
|
2022-04-15 12:52:54 +08:00
|
|
|
SECTOR_TO_BLOCK(bdev_max_discard_sectors(bdev));
|
2017-03-08 10:02:02 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-10-04 09:08:34 +08:00
|
|
|
struct list_head *wait_list = (dpolicy->type == DPOLICY_FSTRIM) ?
|
|
|
|
&(dcc->fstrim_list) : &(dcc->wait_list);
|
2022-07-15 02:07:18 +08:00
|
|
|
blk_opf_t flag = dpolicy->sync ? REQ_SYNC : 0;
|
2018-08-06 22:43:50 +08:00
|
|
|
block_t lstart, start, len, total_len;
|
|
|
|
int err = 0;
|
2017-03-08 10:02:02 +08:00
|
|
|
|
|
|
|
if (dc->state != D_PREP)
|
2018-08-08 10:14:55 +08:00
|
|
|
return 0;
|
2017-03-08 10:02:02 +08:00
|
|
|
|
2018-04-13 11:08:05 +08:00
|
|
|
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK))
|
2018-08-08 10:14:55 +08:00
|
|
|
return 0;
|
2018-04-13 11:08:05 +08:00
|
|
|
|
2023-05-08 16:10:42 +08:00
|
|
|
#ifdef CONFIG_BLK_DEV_ZONED
|
|
|
|
if (f2fs_sb_has_blkzoned(sbi) && bdev_is_zoned(bdev)) {
|
|
|
|
__submit_zone_reset_cmd(sbi, dc, flag, wait_list, issued);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
trace_f2fs_issue_discard(bdev, dc->di.start, dc->di.len);
|
2018-08-06 22:43:50 +08:00
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
lstart = dc->di.lstart;
|
|
|
|
start = dc->di.start;
|
|
|
|
len = dc->di.len;
|
2018-08-06 22:43:50 +08:00
|
|
|
total_len = len;
|
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
dc->di.len = 0;
|
2018-08-06 22:43:50 +08:00
|
|
|
|
|
|
|
while (total_len && *issued < dpolicy->max_requests && !err) {
|
|
|
|
struct bio *bio = NULL;
|
|
|
|
unsigned long flags;
|
|
|
|
bool last = true;
|
|
|
|
|
|
|
|
if (len > max_discard_blocks) {
|
|
|
|
len = max_discard_blocks;
|
|
|
|
last = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
(*issued)++;
|
|
|
|
if (*issued == dpolicy->max_requests)
|
|
|
|
last = true;
|
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
dc->di.len += len;
|
2018-08-06 22:43:50 +08:00
|
|
|
|
2018-08-06 20:30:18 +08:00
|
|
|
if (time_to_inject(sbi, FAULT_DISCARD)) {
|
|
|
|
err = -EIO;
|
2022-11-12 00:13:49 +08:00
|
|
|
} else {
|
|
|
|
err = __blkdev_issue_discard(bdev,
|
2018-08-06 22:43:50 +08:00
|
|
|
SECTOR_FROM_BLOCK(start),
|
|
|
|
SECTOR_FROM_BLOCK(len),
|
2022-04-15 12:52:57 +08:00
|
|
|
GFP_NOFS, &bio);
|
2022-11-12 00:13:49 +08:00
|
|
|
}
|
2018-08-08 10:14:55 +08:00
|
|
|
if (err) {
|
2018-08-06 22:43:50 +08:00
|
|
|
spin_lock_irqsave(&dc->lock, flags);
|
2018-08-08 10:14:55 +08:00
|
|
|
if (dc->state == D_PARTIAL)
|
2018-08-06 22:43:50 +08:00
|
|
|
dc->state = D_SUBMIT;
|
|
|
|
spin_unlock_irqrestore(&dc->lock, flags);
|
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
break;
|
|
|
|
}
|
2018-08-06 22:43:50 +08:00
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
f2fs_bug_on(sbi, !bio);
|
2018-08-06 22:43:50 +08:00
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
/*
|
|
|
|
* should keep before submission to avoid D_DONE
|
|
|
|
* right away
|
|
|
|
*/
|
|
|
|
spin_lock_irqsave(&dc->lock, flags);
|
|
|
|
if (last)
|
|
|
|
dc->state = D_SUBMIT;
|
|
|
|
else
|
|
|
|
dc->state = D_PARTIAL;
|
|
|
|
dc->bio_ref++;
|
|
|
|
spin_unlock_irqrestore(&dc->lock, flags);
|
2018-08-06 22:43:50 +08:00
|
|
|
|
2018-12-14 08:53:57 +08:00
|
|
|
atomic_inc(&dcc->queued_discard);
|
|
|
|
dc->queued++;
|
2018-08-08 10:14:55 +08:00
|
|
|
list_move_tail(&dc->list, wait_list);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
/* sanity check on discard range */
|
2018-12-18 17:32:23 +08:00
|
|
|
__check_sit_bitmap(sbi, lstart, lstart + len);
|
2018-08-06 22:43:50 +08:00
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
bio->bi_private = dc;
|
|
|
|
bio->bi_end_io = f2fs_submit_discard_endio;
|
|
|
|
bio->bi_opf |= flag;
|
|
|
|
submit_bio(bio);
|
|
|
|
|
|
|
|
atomic_inc(&dcc->issued_discard);
|
|
|
|
|
2022-12-22 03:19:32 +08:00
|
|
|
f2fs_update_iostat(sbi, NULL, FS_DISCARD_IO, len * F2FS_BLKSIZE);
|
2018-08-06 22:43:50 +08:00
|
|
|
|
|
|
|
lstart += len;
|
|
|
|
start += len;
|
|
|
|
total_len -= len;
|
|
|
|
len = total_len;
|
2017-03-08 10:02:02 +08:00
|
|
|
}
|
2018-08-06 22:43:50 +08:00
|
|
|
|
2020-04-16 14:17:41 +08:00
|
|
|
if (!err && len) {
|
|
|
|
dcc->undiscard_blks -= len;
|
2018-08-06 22:43:50 +08:00
|
|
|
__update_discard_tree_range(sbi, bdev, lstart, start, len);
|
2020-04-16 14:17:41 +08:00
|
|
|
}
|
2018-08-08 10:14:55 +08:00
|
|
|
return err;
|
2017-03-08 10:02:02 +08:00
|
|
|
}
|
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
static void __insert_discard_cmd(struct f2fs_sb_info *sbi,
|
2017-04-14 23:24:55 +08:00
|
|
|
struct block_device *bdev, block_t lstart,
|
2023-03-11 03:12:35 +08:00
|
|
|
block_t start, block_t len)
|
2017-03-08 10:02:02 +08:00
|
|
|
{
|
2017-04-14 23:24:55 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2023-03-11 03:12:35 +08:00
|
|
|
struct rb_node **p = &dcc->root.rb_root.rb_node;
|
2017-04-14 23:24:55 +08:00
|
|
|
struct rb_node *parent = NULL;
|
2023-03-11 03:12:35 +08:00
|
|
|
struct discard_cmd *dc;
|
2018-10-04 11:18:30 +08:00
|
|
|
bool leftmost = true;
|
2017-04-14 23:24:55 +08:00
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
/* look up rb tree to find parent node */
|
|
|
|
while (*p) {
|
|
|
|
parent = *p;
|
|
|
|
dc = rb_entry(parent, struct discard_cmd, rb_node);
|
|
|
|
|
|
|
|
if (lstart < dc->di.lstart) {
|
|
|
|
p = &(*p)->rb_left;
|
|
|
|
} else if (lstart >= dc->di.lstart + dc->di.len) {
|
|
|
|
p = &(*p)->rb_right;
|
|
|
|
leftmost = false;
|
|
|
|
} else {
|
|
|
|
f2fs_bug_on(sbi, 1);
|
|
|
|
}
|
2017-04-14 23:24:55 +08:00
|
|
|
}
|
2017-03-08 10:02:02 +08:00
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
dc = __create_discard_cmd(sbi, bdev, lstart, start, len);
|
|
|
|
|
|
|
|
rb_link_node(&dc->rb_node, parent, p);
|
|
|
|
rb_insert_color_cached(&dc->rb_node, &dcc->root, leftmost);
|
2017-03-08 10:02:02 +08:00
|
|
|
}
|
|
|
|
|
2017-04-15 14:09:37 +08:00
|
|
|
static void __relocate_discard_cmd(struct discard_cmd_control *dcc,
|
|
|
|
struct discard_cmd *dc)
|
|
|
|
{
|
2023-03-11 03:12:35 +08:00
|
|
|
list_move_tail(&dc->list, &dcc->pend_list[plist_idx(dc->di.len)]);
|
2017-04-15 14:09:37 +08:00
|
|
|
}
|
|
|
|
|
2017-03-02 10:36:20 +08:00
|
|
|
static void __punch_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_cmd *dc, block_t blkaddr)
|
|
|
|
{
|
2017-04-15 14:09:37 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-04-14 23:24:55 +08:00
|
|
|
struct discard_info di = dc->di;
|
|
|
|
bool modified = false;
|
2017-03-02 10:36:20 +08:00
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
if (dc->state == D_DONE || dc->di.len == 1) {
|
2017-03-02 10:36:20 +08:00
|
|
|
__remove_discard_cmd(sbi, dc);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks -= di.len;
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
if (blkaddr > di.lstart) {
|
2023-03-11 03:12:35 +08:00
|
|
|
dc->di.len = blkaddr - dc->di.lstart;
|
|
|
|
dcc->undiscard_blks += dc->di.len;
|
2017-04-15 14:09:37 +08:00
|
|
|
__relocate_discard_cmd(dcc, dc);
|
2017-04-14 23:24:55 +08:00
|
|
|
modified = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (blkaddr < di.lstart + di.len - 1) {
|
|
|
|
if (modified) {
|
2023-03-11 03:12:35 +08:00
|
|
|
__insert_discard_cmd(sbi, dc->bdev, blkaddr + 1,
|
2017-04-14 23:24:55 +08:00
|
|
|
di.start + blkaddr + 1 - di.lstart,
|
2023-03-11 03:12:35 +08:00
|
|
|
di.lstart + di.len - 1 - blkaddr);
|
2017-04-14 23:24:55 +08:00
|
|
|
} else {
|
2023-03-11 03:12:35 +08:00
|
|
|
dc->di.lstart++;
|
|
|
|
dc->di.len--;
|
|
|
|
dc->di.start++;
|
|
|
|
dcc->undiscard_blks += dc->di.len;
|
2017-04-15 14:09:37 +08:00
|
|
|
__relocate_discard_cmd(dcc, dc);
|
2017-04-14 23:24:55 +08:00
|
|
|
}
|
2017-03-02 10:36:20 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
static void __update_discard_tree_range(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t lstart,
|
|
|
|
block_t start, block_t len)
|
2016-08-29 23:58:34 +08:00
|
|
|
{
|
2017-01-12 06:40:24 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-04-14 23:24:55 +08:00
|
|
|
struct discard_cmd *prev_dc = NULL, *next_dc = NULL;
|
|
|
|
struct discard_cmd *dc;
|
|
|
|
struct discard_info di = {0};
|
|
|
|
struct rb_node **insert_p = NULL, *insert_parent = NULL;
|
2018-08-06 22:43:50 +08:00
|
|
|
unsigned int max_discard_blocks =
|
2022-04-15 12:52:54 +08:00
|
|
|
SECTOR_TO_BLOCK(bdev_max_discard_sectors(bdev));
|
2017-04-14 23:24:55 +08:00
|
|
|
block_t end = lstart + len;
|
2016-08-29 23:58:34 +08:00
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
dc = __lookup_discard_cmd_ret(&dcc->root, lstart,
|
|
|
|
&prev_dc, &next_dc, &insert_p, &insert_parent);
|
2017-04-14 23:24:55 +08:00
|
|
|
if (dc)
|
|
|
|
prev_dc = dc;
|
|
|
|
|
|
|
|
if (!prev_dc) {
|
|
|
|
di.lstart = lstart;
|
2023-03-11 03:12:35 +08:00
|
|
|
di.len = next_dc ? next_dc->di.lstart - lstart : len;
|
2017-04-14 23:24:55 +08:00
|
|
|
di.len = min(di.len, len);
|
|
|
|
di.start = start;
|
2017-04-05 18:19:48 +08:00
|
|
|
}
|
2017-01-10 12:32:07 +08:00
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
while (1) {
|
|
|
|
struct rb_node *node;
|
|
|
|
bool merged = false;
|
|
|
|
struct discard_cmd *tdc = NULL;
|
|
|
|
|
|
|
|
if (prev_dc) {
|
2023-03-11 03:12:35 +08:00
|
|
|
di.lstart = prev_dc->di.lstart + prev_dc->di.len;
|
2017-04-14 23:24:55 +08:00
|
|
|
if (di.lstart < lstart)
|
|
|
|
di.lstart = lstart;
|
|
|
|
if (di.lstart >= end)
|
|
|
|
break;
|
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
if (!next_dc || next_dc->di.lstart > end)
|
2017-04-14 23:24:55 +08:00
|
|
|
di.len = end - di.lstart;
|
|
|
|
else
|
2023-03-11 03:12:35 +08:00
|
|
|
di.len = next_dc->di.lstart - di.lstart;
|
2017-04-14 23:24:55 +08:00
|
|
|
di.start = start + di.lstart - lstart;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!di.len)
|
|
|
|
goto next;
|
|
|
|
|
|
|
|
if (prev_dc && prev_dc->state == D_PREP &&
|
|
|
|
prev_dc->bdev == bdev &&
|
2018-08-06 22:43:50 +08:00
|
|
|
__is_discard_back_mergeable(&di, &prev_dc->di,
|
|
|
|
max_discard_blocks)) {
|
2017-04-14 23:24:55 +08:00
|
|
|
prev_dc->di.len += di.len;
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks += di.len;
|
2017-04-15 14:09:37 +08:00
|
|
|
__relocate_discard_cmd(dcc, prev_dc);
|
2017-04-14 23:24:55 +08:00
|
|
|
di = prev_dc->di;
|
|
|
|
tdc = prev_dc;
|
|
|
|
merged = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (next_dc && next_dc->state == D_PREP &&
|
|
|
|
next_dc->bdev == bdev &&
|
2018-08-06 22:43:50 +08:00
|
|
|
__is_discard_front_mergeable(&di, &next_dc->di,
|
|
|
|
max_discard_blocks)) {
|
2017-04-14 23:24:55 +08:00
|
|
|
next_dc->di.lstart = di.lstart;
|
|
|
|
next_dc->di.len += di.len;
|
|
|
|
next_dc->di.start = di.start;
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks += di.len;
|
2017-04-15 14:09:37 +08:00
|
|
|
__relocate_discard_cmd(dcc, next_dc);
|
2017-04-14 23:24:55 +08:00
|
|
|
if (tdc)
|
|
|
|
__remove_discard_cmd(sbi, tdc);
|
|
|
|
merged = true;
|
2016-12-30 06:07:53 +08:00
|
|
|
}
|
2017-04-14 23:24:55 +08:00
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
if (!merged)
|
|
|
|
__insert_discard_cmd(sbi, bdev,
|
|
|
|
di.lstart, di.start, di.len);
|
2017-04-14 23:24:55 +08:00
|
|
|
next:
|
|
|
|
prev_dc = next_dc;
|
|
|
|
if (!prev_dc)
|
|
|
|
break;
|
|
|
|
|
|
|
|
node = rb_next(&prev_dc->rb_node);
|
|
|
|
next_dc = rb_entry_safe(node, struct discard_cmd, rb_node);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-05-08 16:10:42 +08:00
|
|
|
#ifdef CONFIG_BLK_DEV_ZONED
|
|
|
|
static void __queue_zone_reset_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t blkstart, block_t lblkstart,
|
|
|
|
block_t blklen)
|
|
|
|
{
|
|
|
|
trace_f2fs_queue_reset_zone(bdev, blkstart);
|
|
|
|
|
|
|
|
mutex_lock(&SM_I(sbi)->dcc_info->cmd_lock);
|
|
|
|
__insert_discard_cmd(sbi, bdev, lblkstart, blkstart, blklen);
|
|
|
|
mutex_unlock(&SM_I(sbi)->dcc_info->cmd_lock);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2022-11-17 01:10:45 +08:00
|
|
|
static void __queue_discard_cmd(struct f2fs_sb_info *sbi,
|
2017-04-14 23:24:55 +08:00
|
|
|
struct block_device *bdev, block_t blkstart, block_t blklen)
|
|
|
|
{
|
|
|
|
block_t lblkstart = blkstart;
|
|
|
|
|
2019-03-16 08:13:08 +08:00
|
|
|
if (!f2fs_bdev_support_discard(bdev))
|
2022-11-17 01:10:45 +08:00
|
|
|
return;
|
2019-03-16 08:13:08 +08:00
|
|
|
|
2017-04-15 14:09:38 +08:00
|
|
|
trace_f2fs_queue_discard(bdev, blkstart, blklen);
|
2017-04-14 23:24:55 +08:00
|
|
|
|
2019-03-16 08:13:06 +08:00
|
|
|
if (f2fs_is_multi_device(sbi)) {
|
2017-04-14 23:24:55 +08:00
|
|
|
int devi = f2fs_target_device_index(sbi, blkstart);
|
|
|
|
|
|
|
|
blkstart -= FDEV(devi).start_blk;
|
|
|
|
}
|
2018-08-06 22:43:50 +08:00
|
|
|
mutex_lock(&SM_I(sbi)->dcc_info->cmd_lock);
|
2017-04-14 23:24:55 +08:00
|
|
|
__update_discard_tree_range(sbi, bdev, lblkstart, blkstart, blklen);
|
2018-08-06 22:43:50 +08:00
|
|
|
mutex_unlock(&SM_I(sbi)->dcc_info->cmd_lock);
|
2017-04-14 23:24:55 +08:00
|
|
|
}
|
|
|
|
|
2022-12-13 17:34:19 +08:00
|
|
|
static void __issue_discard_cmd_orderly(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_policy *dpolicy, int *issued)
|
2018-07-08 22:11:01 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct discard_cmd *prev_dc = NULL, *next_dc = NULL;
|
|
|
|
struct rb_node **insert_p = NULL, *insert_parent = NULL;
|
|
|
|
struct discard_cmd *dc;
|
|
|
|
struct blk_plug plug;
|
|
|
|
bool io_interrupted = false;
|
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
2023-03-11 03:12:35 +08:00
|
|
|
dc = __lookup_discard_cmd_ret(&dcc->root, dcc->next_pos,
|
|
|
|
&prev_dc, &next_dc, &insert_p, &insert_parent);
|
2018-07-08 22:11:01 +08:00
|
|
|
if (!dc)
|
|
|
|
dc = next_dc;
|
|
|
|
|
|
|
|
blk_start_plug(&plug);
|
|
|
|
|
|
|
|
while (dc) {
|
|
|
|
struct rb_node *node;
|
2018-08-08 10:14:55 +08:00
|
|
|
int err = 0;
|
2018-07-08 22:11:01 +08:00
|
|
|
|
|
|
|
if (dc->state != D_PREP)
|
|
|
|
goto next;
|
|
|
|
|
2018-09-19 16:48:47 +08:00
|
|
|
if (dpolicy->io_aware && !is_idle(sbi, DISCARD_TIME)) {
|
2018-07-08 22:11:01 +08:00
|
|
|
io_interrupted = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
dcc->next_pos = dc->di.lstart + dc->di.len;
|
2022-12-13 17:34:19 +08:00
|
|
|
err = __submit_discard_cmd(sbi, dpolicy, dc, issued);
|
2018-07-08 22:11:01 +08:00
|
|
|
|
2022-12-13 17:34:19 +08:00
|
|
|
if (*issued >= dpolicy->max_requests)
|
2018-07-08 22:11:01 +08:00
|
|
|
break;
|
|
|
|
next:
|
|
|
|
node = rb_next(&dc->rb_node);
|
2018-08-08 10:14:55 +08:00
|
|
|
if (err)
|
|
|
|
__remove_discard_cmd(sbi, dc);
|
2018-07-08 22:11:01 +08:00
|
|
|
dc = rb_entry_safe(node, struct discard_cmd, rb_node);
|
|
|
|
}
|
|
|
|
|
|
|
|
blk_finish_plug(&plug);
|
|
|
|
|
|
|
|
if (!dc)
|
|
|
|
dcc->next_pos = 0;
|
|
|
|
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
|
|
|
|
2022-12-13 17:34:19 +08:00
|
|
|
if (!(*issued) && io_interrupted)
|
|
|
|
*issued = -1;
|
2018-07-08 22:11:01 +08:00
|
|
|
}
|
2020-04-15 17:07:53 +08:00
|
|
|
static unsigned int __wait_all_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_policy *dpolicy);
|
2018-07-08 22:11:01 +08:00
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
static int __issue_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_policy *dpolicy)
|
2017-04-25 20:21:37 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct list_head *pend_list;
|
|
|
|
struct discard_cmd *dc, *tmp;
|
|
|
|
struct blk_plug plug;
|
2020-04-15 17:07:53 +08:00
|
|
|
int i, issued;
|
2017-09-12 21:35:12 +08:00
|
|
|
bool io_interrupted = false;
|
2017-04-25 20:21:37 +08:00
|
|
|
|
2020-03-26 17:43:56 +08:00
|
|
|
if (dpolicy->timeout)
|
|
|
|
f2fs_update_time(sbi, UMOUNT_DISCARD_TIMEOUT);
|
2019-01-15 02:42:11 +08:00
|
|
|
|
2020-04-15 17:07:53 +08:00
|
|
|
retry:
|
|
|
|
issued = 0;
|
2017-10-04 09:08:34 +08:00
|
|
|
for (i = MAX_PLIST_NUM - 1; i >= 0; i--) {
|
2020-03-26 17:43:56 +08:00
|
|
|
if (dpolicy->timeout &&
|
|
|
|
f2fs_time_over(sbi, UMOUNT_DISCARD_TIMEOUT))
|
2019-01-15 02:42:11 +08:00
|
|
|
break;
|
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
if (i + 1 < dpolicy->granularity)
|
|
|
|
break;
|
2018-07-08 22:11:01 +08:00
|
|
|
|
2022-12-13 17:34:19 +08:00
|
|
|
if (i + 1 < dcc->max_ordered_discard && dpolicy->ordered) {
|
|
|
|
__issue_discard_cmd_orderly(sbi, dpolicy, &issued);
|
|
|
|
return issued;
|
|
|
|
}
|
2018-07-08 22:11:01 +08:00
|
|
|
|
2017-04-25 20:21:37 +08:00
|
|
|
pend_list = &dcc->pend_list[i];
|
2017-10-04 09:08:35 +08:00
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
2018-01-08 18:48:33 +08:00
|
|
|
if (list_empty(pend_list))
|
|
|
|
goto next;
|
2018-06-22 16:06:59 +08:00
|
|
|
if (unlikely(dcc->rbtree_check))
|
2023-03-11 03:12:35 +08:00
|
|
|
f2fs_bug_on(sbi, !f2fs_check_discard_tree(sbi));
|
2017-10-04 09:08:35 +08:00
|
|
|
blk_start_plug(&plug);
|
2017-04-25 20:21:37 +08:00
|
|
|
list_for_each_entry_safe(dc, tmp, pend_list, list) {
|
|
|
|
f2fs_bug_on(sbi, dc->state != D_PREP);
|
|
|
|
|
2020-03-26 17:43:56 +08:00
|
|
|
if (dpolicy->timeout &&
|
|
|
|
f2fs_time_over(sbi, UMOUNT_DISCARD_TIMEOUT))
|
2019-07-03 10:29:57 +08:00
|
|
|
break;
|
|
|
|
|
2017-10-04 09:08:33 +08:00
|
|
|
if (dpolicy->io_aware && i < dpolicy->io_aware_gran &&
|
2018-09-19 16:48:47 +08:00
|
|
|
!is_idle(sbi, DISCARD_TIME)) {
|
2017-09-12 21:35:12 +08:00
|
|
|
io_interrupted = true;
|
2018-07-08 22:08:09 +08:00
|
|
|
break;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
}
|
2017-09-12 21:35:12 +08:00
|
|
|
|
2018-08-06 22:43:50 +08:00
|
|
|
__submit_discard_cmd(sbi, dpolicy, dc, &issued);
|
2018-07-08 22:08:09 +08:00
|
|
|
|
2018-08-06 22:43:50 +08:00
|
|
|
if (issued >= dpolicy->max_requests)
|
2017-10-04 09:08:35 +08:00
|
|
|
break;
|
2017-04-25 20:21:37 +08:00
|
|
|
}
|
2017-10-04 09:08:35 +08:00
|
|
|
blk_finish_plug(&plug);
|
2018-01-08 18:48:33 +08:00
|
|
|
next:
|
2017-10-04 09:08:35 +08:00
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
|
|
|
|
2018-07-08 22:08:09 +08:00
|
|
|
if (issued >= dpolicy->max_requests || io_interrupted)
|
2017-10-04 09:08:35 +08:00
|
|
|
break;
|
2017-04-25 20:21:37 +08:00
|
|
|
}
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
|
2020-04-15 17:07:53 +08:00
|
|
|
if (dpolicy->type == DPOLICY_UMOUNT && issued) {
|
|
|
|
__wait_all_discard_cmd(sbi, dpolicy);
|
|
|
|
goto retry;
|
|
|
|
}
|
|
|
|
|
2017-09-12 21:35:12 +08:00
|
|
|
if (!issued && io_interrupted)
|
|
|
|
issued = -1;
|
|
|
|
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
return issued;
|
|
|
|
}
|
|
|
|
|
2017-10-04 09:08:37 +08:00
|
|
|
static bool __drop_discard_cmd(struct f2fs_sb_info *sbi)
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct list_head *pend_list;
|
|
|
|
struct discard_cmd *dc, *tmp;
|
|
|
|
int i;
|
2017-10-04 09:08:37 +08:00
|
|
|
bool dropped = false;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
|
|
|
for (i = MAX_PLIST_NUM - 1; i >= 0; i--) {
|
|
|
|
pend_list = &dcc->pend_list[i];
|
|
|
|
list_for_each_entry_safe(dc, tmp, pend_list, list) {
|
|
|
|
f2fs_bug_on(sbi, dc->state != D_PREP);
|
|
|
|
__remove_discard_cmd(sbi, dc);
|
2017-10-04 09:08:37 +08:00
|
|
|
dropped = true;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2017-10-04 09:08:37 +08:00
|
|
|
|
|
|
|
return dropped;
|
2017-04-25 20:21:37 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_drop_discard_cmd(struct f2fs_sb_info *sbi)
|
2018-01-18 17:23:29 +08:00
|
|
|
{
|
|
|
|
__drop_discard_cmd(sbi);
|
|
|
|
}
|
|
|
|
|
2017-10-28 16:52:32 +08:00
|
|
|
static unsigned int __wait_one_discard_bio(struct f2fs_sb_info *sbi,
|
2017-06-05 18:29:06 +08:00
|
|
|
struct discard_cmd *dc)
|
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-10-28 16:52:32 +08:00
|
|
|
unsigned int len = 0;
|
2017-06-05 18:29:06 +08:00
|
|
|
|
|
|
|
wait_for_completion_io(&dc->wait);
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
|
|
|
f2fs_bug_on(sbi, dc->state != D_DONE);
|
|
|
|
dc->ref--;
|
2017-10-28 16:52:32 +08:00
|
|
|
if (!dc->ref) {
|
|
|
|
if (!dc->error)
|
2023-03-11 03:12:35 +08:00
|
|
|
len = dc->di.len;
|
2017-06-05 18:29:06 +08:00
|
|
|
__remove_discard_cmd(sbi, dc);
|
2017-10-28 16:52:32 +08:00
|
|
|
}
|
2017-06-05 18:29:06 +08:00
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2017-10-28 16:52:32 +08:00
|
|
|
|
|
|
|
return len;
|
2017-06-05 18:29:06 +08:00
|
|
|
}
|
|
|
|
|
2017-10-28 16:52:32 +08:00
|
|
|
static unsigned int __wait_discard_cmd_range(struct f2fs_sb_info *sbi,
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy *dpolicy,
|
|
|
|
block_t start, block_t end)
|
2017-04-25 20:21:38 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-10-04 09:08:34 +08:00
|
|
|
struct list_head *wait_list = (dpolicy->type == DPOLICY_FSTRIM) ?
|
|
|
|
&(dcc->fstrim_list) : &(dcc->wait_list);
|
2022-04-12 20:20:40 +08:00
|
|
|
struct discard_cmd *dc = NULL, *iter, *tmp;
|
2017-10-28 16:52:32 +08:00
|
|
|
unsigned int trimmed = 0;
|
2017-05-19 23:46:45 +08:00
|
|
|
|
|
|
|
next:
|
2022-04-12 20:20:40 +08:00
|
|
|
dc = NULL;
|
2017-04-25 20:21:38 +08:00
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
2022-04-12 20:20:40 +08:00
|
|
|
list_for_each_entry_safe(iter, tmp, wait_list, list) {
|
2023-03-11 03:12:35 +08:00
|
|
|
if (iter->di.lstart + iter->di.len <= start ||
|
|
|
|
end <= iter->di.lstart)
|
2017-10-04 09:08:32 +08:00
|
|
|
continue;
|
2023-03-11 03:12:35 +08:00
|
|
|
if (iter->di.len < dpolicy->granularity)
|
2017-10-04 09:08:32 +08:00
|
|
|
continue;
|
2022-04-12 20:20:40 +08:00
|
|
|
if (iter->state == D_DONE && !iter->ref) {
|
|
|
|
wait_for_completion_io(&iter->wait);
|
|
|
|
if (!iter->error)
|
2023-03-11 03:12:35 +08:00
|
|
|
trimmed += iter->di.len;
|
2022-04-12 20:20:40 +08:00
|
|
|
__remove_discard_cmd(sbi, iter);
|
2017-05-19 23:46:45 +08:00
|
|
|
} else {
|
2022-04-12 20:20:40 +08:00
|
|
|
iter->ref++;
|
|
|
|
dc = iter;
|
2017-05-19 23:46:45 +08:00
|
|
|
break;
|
2017-04-25 20:21:38 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2017-05-19 23:46:45 +08:00
|
|
|
|
2022-04-12 20:20:40 +08:00
|
|
|
if (dc) {
|
2017-10-28 16:52:32 +08:00
|
|
|
trimmed += __wait_one_discard_bio(sbi, dc);
|
2017-05-19 23:46:45 +08:00
|
|
|
goto next;
|
|
|
|
}
|
2017-10-28 16:52:32 +08:00
|
|
|
|
|
|
|
return trimmed;
|
2017-04-25 20:21:38 +08:00
|
|
|
}
|
|
|
|
|
2018-06-25 20:33:24 +08:00
|
|
|
static unsigned int __wait_all_discard_cmd(struct f2fs_sb_info *sbi,
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy *dpolicy)
|
2017-10-04 09:08:32 +08:00
|
|
|
{
|
2018-05-25 04:57:26 +08:00
|
|
|
struct discard_policy dp;
|
2018-06-25 20:33:24 +08:00
|
|
|
unsigned int discard_blks;
|
2018-05-25 04:57:26 +08:00
|
|
|
|
2018-06-25 20:33:24 +08:00
|
|
|
if (dpolicy)
|
|
|
|
return __wait_discard_cmd_range(sbi, dpolicy, 0, UINT_MAX);
|
2018-05-25 04:57:26 +08:00
|
|
|
|
|
|
|
/* wait all */
|
2022-12-17 13:24:48 +08:00
|
|
|
__init_discard_policy(sbi, &dp, DPOLICY_FSTRIM, MIN_DISCARD_GRANULARITY);
|
2018-06-25 20:33:24 +08:00
|
|
|
discard_blks = __wait_discard_cmd_range(sbi, &dp, 0, UINT_MAX);
|
2022-12-17 13:24:48 +08:00
|
|
|
__init_discard_policy(sbi, &dp, DPOLICY_UMOUNT, MIN_DISCARD_GRANULARITY);
|
2018-06-25 20:33:24 +08:00
|
|
|
discard_blks += __wait_discard_cmd_range(sbi, &dp, 0, UINT_MAX);
|
|
|
|
|
|
|
|
return discard_blks;
|
2017-10-04 09:08:32 +08:00
|
|
|
}
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
/* This should be covered by global mutex, &sit_i->sentry_lock */
|
2018-01-05 17:41:20 +08:00
|
|
|
static void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, block_t blkaddr)
|
2017-04-14 23:24:55 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct discard_cmd *dc;
|
2017-04-26 17:39:54 +08:00
|
|
|
bool need_wait = false;
|
2017-04-14 23:24:55 +08:00
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
2023-03-11 03:12:35 +08:00
|
|
|
dc = __lookup_discard_cmd(sbi, blkaddr);
|
2023-05-08 16:10:42 +08:00
|
|
|
#ifdef CONFIG_BLK_DEV_ZONED
|
|
|
|
if (dc && f2fs_sb_has_blkzoned(sbi) && bdev_is_zoned(dc->bdev)) {
|
|
|
|
/* force submit zone reset */
|
|
|
|
if (dc->state == D_PREP)
|
|
|
|
__submit_zone_reset_cmd(sbi, dc, REQ_SYNC,
|
|
|
|
&dcc->wait_list, NULL);
|
|
|
|
dc->ref++;
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
|
|
|
/* wait zone reset */
|
|
|
|
__wait_one_discard_bio(sbi, dc);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
#endif
|
2017-04-14 23:24:55 +08:00
|
|
|
if (dc) {
|
2017-04-26 17:39:54 +08:00
|
|
|
if (dc->state == D_PREP) {
|
|
|
|
__punch_discard_cmd(sbi, dc, blkaddr);
|
|
|
|
} else {
|
|
|
|
dc->ref++;
|
|
|
|
need_wait = true;
|
|
|
|
}
|
2016-08-29 23:58:34 +08:00
|
|
|
}
|
2017-04-05 18:19:49 +08:00
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2017-04-26 17:39:54 +08:00
|
|
|
|
2017-06-05 18:29:06 +08:00
|
|
|
if (need_wait)
|
|
|
|
__wait_one_discard_bio(sbi, dc);
|
2017-04-05 18:19:49 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_stop_discard_thread(struct f2fs_sb_info *sbi)
|
2017-06-29 23:17:45 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
|
|
|
|
if (dcc && dcc->f2fs_issue_discard) {
|
|
|
|
struct task_struct *discard_thread = dcc->f2fs_issue_discard;
|
|
|
|
|
|
|
|
dcc->f2fs_issue_discard = NULL;
|
|
|
|
kthread_stop(discard_thread);
|
2017-04-26 17:39:54 +08:00
|
|
|
}
|
2017-04-05 18:19:49 +08:00
|
|
|
}
|
|
|
|
|
2023-01-13 03:14:04 +08:00
|
|
|
/**
|
|
|
|
* f2fs_issue_discard_timeout() - Issue all discard cmd within UMOUNT_DISCARD_TIMEOUT
|
|
|
|
* @sbi: the f2fs_sb_info data for discard cmd to issue
|
|
|
|
*
|
|
|
|
* When UMOUNT_DISCARD_TIMEOUT is exceeded, all remaining discard commands will be dropped
|
|
|
|
*
|
|
|
|
* Return true if issued all discard cmd or no discard cmd need issue, otherwise return false.
|
|
|
|
*/
|
2019-01-15 02:42:11 +08:00
|
|
|
bool f2fs_issue_discard_timeout(struct f2fs_sb_info *sbi)
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy dpolicy;
|
2017-10-04 09:08:37 +08:00
|
|
|
bool dropped;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
|
2022-12-02 12:58:41 +08:00
|
|
|
if (!atomic_read(&dcc->discard_cmd_cnt))
|
2023-01-13 03:14:04 +08:00
|
|
|
return true;
|
2022-12-02 12:58:41 +08:00
|
|
|
|
2018-05-30 00:58:42 +08:00
|
|
|
__init_discard_policy(sbi, &dpolicy, DPOLICY_UMOUNT,
|
|
|
|
dcc->discard_granularity);
|
2017-10-04 09:08:34 +08:00
|
|
|
__issue_discard_cmd(sbi, &dpolicy);
|
2017-10-04 09:08:37 +08:00
|
|
|
dropped = __drop_discard_cmd(sbi);
|
|
|
|
|
2018-05-25 04:57:26 +08:00
|
|
|
/* just to make sure there is no pending discard commands */
|
|
|
|
__wait_all_discard_cmd(sbi, NULL);
|
2018-07-08 22:16:53 +08:00
|
|
|
|
|
|
|
f2fs_bug_on(sbi, atomic_read(&dcc->discard_cmd_cnt));
|
2023-01-13 03:14:04 +08:00
|
|
|
return !dropped;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
}
|
|
|
|
|
2017-01-10 12:32:07 +08:00
|
|
|
static int issue_discard_thread(void *data)
|
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = data;
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
wait_queue_head_t *q = &dcc->discard_wait_queue;
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy dpolicy;
|
2021-12-14 09:12:03 +08:00
|
|
|
unsigned int wait_ms = dcc->min_discard_issue_time;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
int issued;
|
2017-01-10 12:32:07 +08:00
|
|
|
|
2017-05-18 01:36:58 +08:00
|
|
|
set_freezable();
|
2017-01-10 12:32:07 +08:00
|
|
|
|
2017-05-18 01:36:58 +08:00
|
|
|
do {
|
2022-11-18 11:46:00 +08:00
|
|
|
wait_event_interruptible_timeout(*q,
|
|
|
|
kthread_should_stop() || freezing(current) ||
|
|
|
|
dcc->discard_wake,
|
|
|
|
msecs_to_jiffies(wait_ms));
|
|
|
|
|
2021-04-06 17:09:16 +08:00
|
|
|
if (sbi->gc_mode == GC_URGENT_HIGH ||
|
|
|
|
!f2fs_available_free_memory(sbi, DISCARD_CACHE))
|
2022-12-17 13:24:48 +08:00
|
|
|
__init_discard_policy(sbi, &dpolicy, DPOLICY_FORCE,
|
|
|
|
MIN_DISCARD_GRANULARITY);
|
2021-04-06 17:09:16 +08:00
|
|
|
else
|
|
|
|
__init_discard_policy(sbi, &dpolicy, DPOLICY_BG,
|
|
|
|
dcc->discard_granularity);
|
|
|
|
|
2018-05-08 17:51:34 +08:00
|
|
|
if (dcc->discard_wake)
|
2022-12-12 21:36:44 +08:00
|
|
|
dcc->discard_wake = false;
|
2018-05-08 17:51:34 +08:00
|
|
|
|
2018-12-14 12:50:51 +08:00
|
|
|
/* clean up pending candidates before going to sleep */
|
|
|
|
if (atomic_read(&dcc->queued_discard))
|
|
|
|
__wait_all_discard_cmd(sbi, NULL);
|
|
|
|
|
2017-05-18 01:36:58 +08:00
|
|
|
if (try_to_freeze())
|
|
|
|
continue;
|
2018-01-25 18:57:27 +08:00
|
|
|
if (f2fs_readonly(sbi->sb))
|
|
|
|
continue;
|
2017-05-18 01:36:58 +08:00
|
|
|
if (kthread_should_stop())
|
|
|
|
return 0;
|
2022-11-18 11:46:00 +08:00
|
|
|
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK) ||
|
|
|
|
!atomic_read(&dcc->discard_cmd_cnt)) {
|
2018-04-13 11:08:05 +08:00
|
|
|
wait_ms = dpolicy.max_interval;
|
|
|
|
continue;
|
|
|
|
}
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
|
f2fs: make background threads of f2fs being aware of freezing
When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.
This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread
Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
generic/241 reports below bug:
======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc1+ #32 Tainted: G O
------------------------------------------------------
f2fs_gc-250:0/22186 is trying to acquire lock:
(&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
but task is already holding lock:
(sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){++++.-}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__sb_start_write+0x11d/0x1f0
f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
evict+0xa8/0x170
iput+0x1fb/0x2c0
f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
write_checkpoint+0x1b1/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
f2fs_sync_file+0x34/0x40 [f2fs]
vfs_fsync_range+0x4a/0xa0
do_fsync+0x3c/0x60
SyS_fdatasync+0x15/0x20
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #1 (&sbi->cp_mutex){+.+...}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
write_checkpoint+0x2f/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x92
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #0 (&sbi->gc_mutex){+.+...}:
validate_chain.isra.36+0xc50/0xdb0
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
kthread+0xe9/0x120
ret_from_fork+0x19/0x24
other info that might help us debug this:
Chain exists of:
&sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal#2);
lock(&sbi->cp_mutex);
lock(sb_internal#2);
lock(&sbi->gc_mutex);
*** DEADLOCK ***
1 lock held by f2fs_gc-250:0/22186:
#0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
stack backtrace:
CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_circular_bug+0x1b3/0x1bd
validate_chain.isra.36+0xc50/0xdb0
? __this_cpu_preempt_check+0xf/0x20
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
__mutex_lock+0x4f/0x830
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
mutex_lock_nested+0x25/0x30
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
? preempt_schedule_common+0x2f/0x4d
? f2fs_gc+0x540/0x540 [f2fs]
kthread+0xe9/0x120
? f2fs_gc+0x540/0x540 [f2fs]
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x19/0x24
The deadlock occurs in below condition:
GC Thread Thread B
- sb_start_intwrite
- f2fs_sync_file
- f2fs_sync_fs
- mutex_lock(&sbi->gc_mutex)
- write_checkpoint
- block_operations
- f2fs_sync_inode_meta
- iput
- sb_start_intwrite
- mutex_lock(&sbi->gc_mutex)
Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-22 08:52:23 +08:00
|
|
|
sb_start_intwrite(sbi->sb);
|
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
issued = __issue_discard_cmd(sbi, &dpolicy);
|
2018-04-08 15:11:11 +08:00
|
|
|
if (issued > 0) {
|
2017-10-04 09:08:34 +08:00
|
|
|
__wait_all_discard_cmd(sbi, &dpolicy);
|
|
|
|
wait_ms = dpolicy.min_interval;
|
2021-04-06 09:47:35 +08:00
|
|
|
} else if (issued == -1) {
|
2018-09-19 16:48:47 +08:00
|
|
|
wait_ms = f2fs_time_to_wait(sbi, DISCARD_TIME);
|
|
|
|
if (!wait_ms)
|
2018-08-31 17:39:26 +08:00
|
|
|
wait_ms = dpolicy.mid_interval;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
} else {
|
2017-10-04 09:08:34 +08:00
|
|
|
wait_ms = dpolicy.max_interval;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
}
|
2022-11-18 11:46:00 +08:00
|
|
|
if (!atomic_read(&dcc->discard_cmd_cnt))
|
|
|
|
wait_ms = dpolicy.max_interval;
|
2017-05-18 01:36:58 +08:00
|
|
|
|
f2fs: make background threads of f2fs being aware of freezing
When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.
This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread
Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
generic/241 reports below bug:
======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc1+ #32 Tainted: G O
------------------------------------------------------
f2fs_gc-250:0/22186 is trying to acquire lock:
(&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
but task is already holding lock:
(sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){++++.-}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__sb_start_write+0x11d/0x1f0
f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
evict+0xa8/0x170
iput+0x1fb/0x2c0
f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
write_checkpoint+0x1b1/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
f2fs_sync_file+0x34/0x40 [f2fs]
vfs_fsync_range+0x4a/0xa0
do_fsync+0x3c/0x60
SyS_fdatasync+0x15/0x20
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #1 (&sbi->cp_mutex){+.+...}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
write_checkpoint+0x2f/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x92
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #0 (&sbi->gc_mutex){+.+...}:
validate_chain.isra.36+0xc50/0xdb0
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
kthread+0xe9/0x120
ret_from_fork+0x19/0x24
other info that might help us debug this:
Chain exists of:
&sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal#2);
lock(&sbi->cp_mutex);
lock(sb_internal#2);
lock(&sbi->gc_mutex);
*** DEADLOCK ***
1 lock held by f2fs_gc-250:0/22186:
#0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
stack backtrace:
CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_circular_bug+0x1b3/0x1bd
validate_chain.isra.36+0xc50/0xdb0
? __this_cpu_preempt_check+0xf/0x20
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
__mutex_lock+0x4f/0x830
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
mutex_lock_nested+0x25/0x30
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
? preempt_schedule_common+0x2f/0x4d
? f2fs_gc+0x540/0x540 [f2fs]
kthread+0xe9/0x120
? f2fs_gc+0x540/0x540 [f2fs]
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x19/0x24
The deadlock occurs in below condition:
GC Thread Thread B
- sb_start_intwrite
- f2fs_sync_file
- f2fs_sync_fs
- mutex_lock(&sbi->gc_mutex)
- write_checkpoint
- block_operations
- f2fs_sync_inode_meta
- iput
- sb_start_intwrite
- mutex_lock(&sbi->gc_mutex)
Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-22 08:52:23 +08:00
|
|
|
sb_end_intwrite(sbi->sb);
|
2017-05-18 01:36:58 +08:00
|
|
|
|
|
|
|
} while (!kthread_should_stop());
|
|
|
|
return 0;
|
2017-01-10 12:32:07 +08:00
|
|
|
}
|
|
|
|
|
2016-10-28 16:45:06 +08:00
|
|
|
#ifdef CONFIG_BLK_DEV_ZONED
|
2016-10-07 10:02:05 +08:00
|
|
|
static int __f2fs_issue_discard_zone(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t blkstart, block_t blklen)
|
2016-10-28 16:45:06 +08:00
|
|
|
{
|
2017-02-23 12:18:35 +08:00
|
|
|
sector_t sector, nr_sects;
|
2017-03-08 09:49:53 +08:00
|
|
|
block_t lblkstart = blkstart;
|
2016-10-07 10:02:05 +08:00
|
|
|
int devi = 0;
|
2023-04-02 11:12:59 +08:00
|
|
|
u64 remainder = 0;
|
2016-10-07 10:02:05 +08:00
|
|
|
|
2019-03-16 08:13:06 +08:00
|
|
|
if (f2fs_is_multi_device(sbi)) {
|
2016-10-07 10:02:05 +08:00
|
|
|
devi = f2fs_target_device_index(sbi, blkstart);
|
2019-03-16 08:13:07 +08:00
|
|
|
if (blkstart < FDEV(devi).start_blk ||
|
|
|
|
blkstart > FDEV(devi).end_blk) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_err(sbi, "Invalid block %x", blkstart);
|
2019-03-16 08:13:07 +08:00
|
|
|
return -EIO;
|
|
|
|
}
|
2016-10-07 10:02:05 +08:00
|
|
|
blkstart -= FDEV(devi).start_blk;
|
|
|
|
}
|
2016-10-28 16:45:06 +08:00
|
|
|
|
2019-03-16 08:13:07 +08:00
|
|
|
/* For sequential zones, reset the zone write pointer */
|
|
|
|
if (f2fs_blkz_is_seq(sbi, devi, blkstart)) {
|
2017-02-23 12:18:35 +08:00
|
|
|
sector = SECTOR_FROM_BLOCK(blkstart);
|
|
|
|
nr_sects = SECTOR_FROM_BLOCK(blklen);
|
2023-04-02 11:12:59 +08:00
|
|
|
div64_u64_rem(sector, bdev_zone_sectors(bdev), &remainder);
|
2017-02-23 12:18:35 +08:00
|
|
|
|
2023-04-02 11:12:59 +08:00
|
|
|
if (remainder || nr_sects != bdev_zone_sectors(bdev)) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_err(sbi, "(%d) %s: Unaligned zone reset attempted (block %x + %x)",
|
|
|
|
devi, sbi->s_ndevs ? FDEV(devi).path : "",
|
|
|
|
blkstart, blklen);
|
2017-02-23 12:18:35 +08:00
|
|
|
return -EIO;
|
|
|
|
}
|
2023-05-08 16:10:42 +08:00
|
|
|
|
|
|
|
if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) {
|
|
|
|
trace_f2fs_issue_reset_zone(bdev, blkstart);
|
|
|
|
return blkdev_zone_mgmt(bdev, REQ_OP_ZONE_RESET,
|
|
|
|
sector, nr_sects, GFP_NOFS);
|
|
|
|
}
|
|
|
|
|
|
|
|
__queue_zone_reset_cmd(sbi, bdev, blkstart, lblkstart, blklen);
|
|
|
|
return 0;
|
2016-10-28 16:45:06 +08:00
|
|
|
}
|
2019-03-16 08:13:07 +08:00
|
|
|
|
|
|
|
/* For conventional zones, use regular discard if supported */
|
2022-11-17 01:10:45 +08:00
|
|
|
__queue_discard_cmd(sbi, bdev, lblkstart, blklen);
|
|
|
|
return 0;
|
2016-10-28 16:45:06 +08:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2016-10-07 10:02:05 +08:00
|
|
|
static int __issue_discard_async(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t blkstart, block_t blklen)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_BLK_DEV_ZONED
|
2019-03-16 08:13:08 +08:00
|
|
|
if (f2fs_sb_has_blkzoned(sbi) && bdev_is_zoned(bdev))
|
2016-10-07 10:02:05 +08:00
|
|
|
return __f2fs_issue_discard_zone(sbi, bdev, blkstart, blklen);
|
|
|
|
#endif
|
2022-11-17 01:10:45 +08:00
|
|
|
__queue_discard_cmd(sbi, bdev, blkstart, blklen);
|
|
|
|
return 0;
|
2016-10-07 10:02:05 +08:00
|
|
|
}
|
|
|
|
|
2014-04-15 12:57:55 +08:00
|
|
|
static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
|
2013-11-12 15:55:17 +08:00
|
|
|
block_t blkstart, block_t blklen)
|
|
|
|
{
|
2016-10-07 10:02:05 +08:00
|
|
|
sector_t start = blkstart, len = 0;
|
|
|
|
struct block_device *bdev;
|
2015-05-01 13:37:50 +08:00
|
|
|
struct seg_entry *se;
|
|
|
|
unsigned int offset;
|
|
|
|
block_t i;
|
2016-10-07 10:02:05 +08:00
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
bdev = f2fs_target_device(sbi, blkstart, NULL);
|
|
|
|
|
|
|
|
for (i = blkstart; i < blkstart + blklen; i++, len++) {
|
|
|
|
if (i != start) {
|
|
|
|
struct block_device *bdev2 =
|
|
|
|
f2fs_target_device(sbi, i, NULL);
|
|
|
|
|
|
|
|
if (bdev2 != bdev) {
|
|
|
|
err = __issue_discard_async(sbi, bdev,
|
|
|
|
start, len);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
bdev = bdev2;
|
|
|
|
start = i;
|
|
|
|
len = 0;
|
|
|
|
}
|
|
|
|
}
|
2015-05-01 13:37:50 +08:00
|
|
|
|
|
|
|
se = get_seg_entry(sbi, GET_SEGNO(sbi, i));
|
|
|
|
offset = GET_BLKOFF_FROM_SEG0(sbi, i);
|
|
|
|
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
if (f2fs_block_unit_discard(sbi) &&
|
|
|
|
!f2fs_test_and_set_bit(offset, se->discard_map))
|
2015-05-01 13:37:50 +08:00
|
|
|
sbi->discard_blks--;
|
|
|
|
}
|
2016-10-28 16:45:06 +08:00
|
|
|
|
2016-10-07 10:02:05 +08:00
|
|
|
if (len)
|
|
|
|
err = __issue_discard_async(sbi, bdev, start, len);
|
|
|
|
return err;
|
2014-04-15 12:57:55 +08:00
|
|
|
}
|
|
|
|
|
2016-12-30 14:06:15 +08:00
|
|
|
static bool add_discard_addrs(struct f2fs_sb_info *sbi, struct cp_control *cpc,
|
|
|
|
bool check_only)
|
2014-10-29 13:27:59 +08:00
|
|
|
{
|
2013-11-12 13:49:56 +08:00
|
|
|
int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
|
|
|
|
int max_blocks = sbi->blocks_per_seg;
|
2014-09-21 13:06:39 +08:00
|
|
|
struct seg_entry *se = get_seg_entry(sbi, cpc->trim_start);
|
2013-11-12 13:49:56 +08:00
|
|
|
unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
|
|
|
|
unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
|
2015-05-01 13:37:50 +08:00
|
|
|
unsigned long *discard_map = (unsigned long *)se->discard_map;
|
2015-02-11 08:44:29 +08:00
|
|
|
unsigned long *dmap = SIT_I(sbi)->tmp_map;
|
2013-11-12 13:49:56 +08:00
|
|
|
unsigned int start = 0, end = -1;
|
2017-04-27 20:40:39 +08:00
|
|
|
bool force = (cpc->reason & CP_DISCARD);
|
2017-03-28 18:18:50 +08:00
|
|
|
struct discard_entry *de = NULL;
|
2017-04-15 14:09:36 +08:00
|
|
|
struct list_head *head = &SM_I(sbi)->dcc_info->entry_list;
|
2013-11-12 13:49:56 +08:00
|
|
|
int i;
|
|
|
|
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
if (se->valid_blocks == max_blocks || !f2fs_hw_support_discard(sbi) ||
|
|
|
|
!f2fs_block_unit_discard(sbi))
|
2016-12-30 14:06:15 +08:00
|
|
|
return false;
|
2013-11-12 13:49:56 +08:00
|
|
|
|
2015-05-01 13:37:50 +08:00
|
|
|
if (!force) {
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
if (!f2fs_realtime_discard_enable(sbi) || !se->valid_blocks ||
|
2017-01-12 06:40:24 +08:00
|
|
|
SM_I(sbi)->dcc_info->nr_discards >=
|
|
|
|
SM_I(sbi)->dcc_info->max_discards)
|
2016-12-30 14:06:15 +08:00
|
|
|
return false;
|
2014-09-21 13:06:39 +08:00
|
|
|
}
|
|
|
|
|
2013-11-12 13:49:56 +08:00
|
|
|
/* SIT_VBLOCK_MAP_SIZE should be multiple of sizeof(unsigned long) */
|
|
|
|
for (i = 0; i < entries; i++)
|
2015-05-01 13:37:50 +08:00
|
|
|
dmap[i] = force ? ~ckpt_map[i] & ~discard_map[i] :
|
2014-12-13 05:53:41 +08:00
|
|
|
(cur_map[i] ^ ckpt_map[i]) & ckpt_map[i];
|
2013-11-12 13:49:56 +08:00
|
|
|
|
2017-01-12 06:40:24 +08:00
|
|
|
while (force || SM_I(sbi)->dcc_info->nr_discards <=
|
|
|
|
SM_I(sbi)->dcc_info->max_discards) {
|
2013-11-12 13:49:56 +08:00
|
|
|
start = __find_rev_next_bit(dmap, max_blocks, end + 1);
|
|
|
|
if (start >= max_blocks)
|
|
|
|
break;
|
|
|
|
|
|
|
|
end = __find_rev_next_zero_bit(dmap, max_blocks, start + 1);
|
2016-07-07 12:13:33 +08:00
|
|
|
if (force && start && end != max_blocks
|
|
|
|
&& (end - start) < cpc->trim_minlen)
|
|
|
|
continue;
|
|
|
|
|
2016-12-30 14:06:15 +08:00
|
|
|
if (check_only)
|
|
|
|
return true;
|
|
|
|
|
2017-03-28 18:18:50 +08:00
|
|
|
if (!de) {
|
|
|
|
de = f2fs_kmem_cache_alloc(discard_entry_slab,
|
2021-08-09 08:24:48 +08:00
|
|
|
GFP_F2FS_ZERO, true, NULL);
|
2017-03-28 18:18:50 +08:00
|
|
|
de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
|
|
|
|
list_add_tail(&de->list, head);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = start; i < end; i++)
|
|
|
|
__set_bit_le(i, (void *)de->discard_map);
|
|
|
|
|
|
|
|
SM_I(sbi)->dcc_info->nr_discards += end - start;
|
2013-11-12 13:49:56 +08:00
|
|
|
}
|
2016-12-30 14:06:15 +08:00
|
|
|
return false;
|
2013-11-12 13:49:56 +08:00
|
|
|
}
|
|
|
|
|
2018-04-25 17:38:29 +08:00
|
|
|
static void release_discard_addr(struct discard_entry *entry)
|
|
|
|
{
|
|
|
|
list_del(&entry->list);
|
|
|
|
kmem_cache_free(discard_entry_slab, entry);
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi)
|
2014-09-21 13:06:39 +08:00
|
|
|
{
|
2017-04-15 14:09:36 +08:00
|
|
|
struct list_head *head = &(SM_I(sbi)->dcc_info->entry_list);
|
2014-09-21 13:06:39 +08:00
|
|
|
struct discard_entry *entry, *this;
|
|
|
|
|
|
|
|
/* drop caches */
|
2018-04-25 17:38:29 +08:00
|
|
|
list_for_each_entry_safe(entry, this, head, list)
|
|
|
|
release_discard_addr(entry);
|
2014-09-21 13:06:39 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
* Should call f2fs_clear_prefree_segments after checkpoint is done.
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
*/
|
|
|
|
static void set_prefree_as_free_segments(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2014-08-04 10:10:07 +08:00
|
|
|
unsigned int segno;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
2014-09-24 02:23:01 +08:00
|
|
|
for_each_set_bit(segno, dirty_i->dirty_segmap[PRE], MAIN_SEGS(sbi))
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
__set_test_and_free(sbi, segno, false);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
|
|
|
|
struct cp_control *cpc)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct list_head *head = &dcc->entry_list;
|
2014-03-29 11:33:17 +08:00
|
|
|
struct discard_entry *entry, *this;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2013-11-11 08:24:37 +08:00
|
|
|
unsigned long *prefree_map = dirty_i->dirty_segmap[PRE];
|
|
|
|
unsigned int start = 0, end = -1;
|
2016-06-04 10:29:38 +08:00
|
|
|
unsigned int secno, start_segno;
|
2017-04-27 20:40:39 +08:00
|
|
|
bool force = (cpc->reason & CP_DISCARD);
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
bool section_alignment = F2FS_OPTION(sbi).discard_unit ==
|
|
|
|
DISCARD_UNIT_SECTION;
|
|
|
|
|
|
|
|
if (f2fs_lfs_mode(sbi) && __is_large_section(sbi))
|
|
|
|
section_alignment = true;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
2013-11-11 08:24:37 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
while (1) {
|
2013-11-11 08:24:37 +08:00
|
|
|
int i;
|
f2fs: issue discard align to section in LFS mode
For the case when sbi->segs_per_sec > 1 with lfs mode, take
section:segment = 5 for example, if the section prefree_map is
...previous section | current section (1 1 0 1 1) | next section...,
then the start = x, end = x + 1, after start = start_segno +
sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but
their bitmap is still set, which will cause duplicated
f2fs_issue_discard of this same section in the next write_checkpoint:
round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0
then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0
write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0,
prefree of NO.2 is cleared, and no discard issued
round 2: rm data block NO.0, NO.1, NO.3, NO.4
all invalid, but prefree bit of NO.2 is set and cleared in round 1, then
prefree_map: 1 1 0 1 1
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no
valid blocks of this section, so discard issued, but this time prefree
bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec;
round 3:
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 ->
0 0 0 0 0, no valid blocks of this section, so discard issued,
this time prefree bit of NO.3 and NO.4 is cleared, but the discard of
this section is sent again...
To fix this problem, we can align the start and end value to section
boundary for fstrim and real-time discard operation, and decide to issue
discard only when the whole section is invalid, which can issue discard
aligned to section size as much as possible and avoid redundant discard.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-19 20:58:15 +08:00
|
|
|
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
if (section_alignment && end != -1)
|
f2fs: issue discard align to section in LFS mode
For the case when sbi->segs_per_sec > 1 with lfs mode, take
section:segment = 5 for example, if the section prefree_map is
...previous section | current section (1 1 0 1 1) | next section...,
then the start = x, end = x + 1, after start = start_segno +
sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but
their bitmap is still set, which will cause duplicated
f2fs_issue_discard of this same section in the next write_checkpoint:
round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0
then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0
write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0,
prefree of NO.2 is cleared, and no discard issued
round 2: rm data block NO.0, NO.1, NO.3, NO.4
all invalid, but prefree bit of NO.2 is set and cleared in round 1, then
prefree_map: 1 1 0 1 1
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no
valid blocks of this section, so discard issued, but this time prefree
bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec;
round 3:
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 ->
0 0 0 0 0, no valid blocks of this section, so discard issued,
this time prefree bit of NO.3 and NO.4 is cleared, but the discard of
this section is sent again...
To fix this problem, we can align the start and end value to section
boundary for fstrim and real-time discard operation, and decide to issue
discard only when the whole section is invalid, which can issue discard
aligned to section size as much as possible and avoid redundant discard.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-19 20:58:15 +08:00
|
|
|
end--;
|
2014-09-24 02:23:01 +08:00
|
|
|
start = find_next_bit(prefree_map, MAIN_SEGS(sbi), end + 1);
|
|
|
|
if (start >= MAIN_SEGS(sbi))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
break;
|
2014-09-24 02:23:01 +08:00
|
|
|
end = find_next_zero_bit(prefree_map, MAIN_SEGS(sbi),
|
|
|
|
start + 1);
|
2013-11-11 08:24:37 +08:00
|
|
|
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
if (section_alignment) {
|
f2fs: issue discard align to section in LFS mode
For the case when sbi->segs_per_sec > 1 with lfs mode, take
section:segment = 5 for example, if the section prefree_map is
...previous section | current section (1 1 0 1 1) | next section...,
then the start = x, end = x + 1, after start = start_segno +
sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but
their bitmap is still set, which will cause duplicated
f2fs_issue_discard of this same section in the next write_checkpoint:
round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0
then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0
write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0,
prefree of NO.2 is cleared, and no discard issued
round 2: rm data block NO.0, NO.1, NO.3, NO.4
all invalid, but prefree bit of NO.2 is set and cleared in round 1, then
prefree_map: 1 1 0 1 1
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no
valid blocks of this section, so discard issued, but this time prefree
bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec;
round 3:
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 ->
0 0 0 0 0, no valid blocks of this section, so discard issued,
this time prefree bit of NO.3 and NO.4 is cleared, but the discard of
this section is sent again...
To fix this problem, we can align the start and end value to section
boundary for fstrim and real-time discard operation, and decide to issue
discard only when the whole section is invalid, which can issue discard
aligned to section size as much as possible and avoid redundant discard.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-19 20:58:15 +08:00
|
|
|
start = rounddown(start, sbi->segs_per_sec);
|
|
|
|
end = roundup(end, sbi->segs_per_sec);
|
|
|
|
}
|
2013-11-11 08:24:37 +08:00
|
|
|
|
f2fs: issue discard align to section in LFS mode
For the case when sbi->segs_per_sec > 1 with lfs mode, take
section:segment = 5 for example, if the section prefree_map is
...previous section | current section (1 1 0 1 1) | next section...,
then the start = x, end = x + 1, after start = start_segno +
sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but
their bitmap is still set, which will cause duplicated
f2fs_issue_discard of this same section in the next write_checkpoint:
round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0
then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0
write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0,
prefree of NO.2 is cleared, and no discard issued
round 2: rm data block NO.0, NO.1, NO.3, NO.4
all invalid, but prefree bit of NO.2 is set and cleared in round 1, then
prefree_map: 1 1 0 1 1
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no
valid blocks of this section, so discard issued, but this time prefree
bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec;
round 3:
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 ->
0 0 0 0 0, no valid blocks of this section, so discard issued,
this time prefree bit of NO.3 and NO.4 is cleared, but the discard of
this section is sent again...
To fix this problem, we can align the start and end value to section
boundary for fstrim and real-time discard operation, and decide to issue
discard only when the whole section is invalid, which can issue discard
aligned to section size as much as possible and avoid redundant discard.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-19 20:58:15 +08:00
|
|
|
for (i = start; i < end; i++) {
|
|
|
|
if (test_and_clear_bit(i, prefree_map))
|
|
|
|
dirty_i->nr_dirty[PRE]--;
|
|
|
|
}
|
2013-11-11 08:24:37 +08:00
|
|
|
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
if (!f2fs_realtime_discard_enable(sbi))
|
2013-11-11 08:24:37 +08:00
|
|
|
continue;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2016-12-22 11:46:24 +08:00
|
|
|
if (force && start >= cpc->trim_start &&
|
|
|
|
(end - 1) <= cpc->trim_end)
|
2023-04-18 08:12:52 +08:00
|
|
|
continue;
|
2016-12-22 11:46:24 +08:00
|
|
|
|
2023-03-13 17:48:25 +08:00
|
|
|
/* Should cover 2MB zoned device for zone-based reset */
|
|
|
|
if (!f2fs_sb_has_blkzoned(sbi) &&
|
|
|
|
(!f2fs_lfs_mode(sbi) || !__is_large_section(sbi))) {
|
2016-06-04 10:29:38 +08:00
|
|
|
f2fs_issue_discard(sbi, START_BLOCK(sbi, start),
|
2013-11-12 15:55:17 +08:00
|
|
|
(end - start) << sbi->log_blocks_per_seg);
|
2016-06-04 10:29:38 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
next:
|
2017-04-08 06:08:17 +08:00
|
|
|
secno = GET_SEC_FROM_SEG(sbi, start);
|
|
|
|
start_segno = GET_SEG_FROM_SEC(sbi, secno);
|
2016-06-04 10:29:38 +08:00
|
|
|
if (!IS_CURSEC(sbi, secno) &&
|
2017-04-08 05:33:22 +08:00
|
|
|
!get_valid_blocks(sbi, start, true))
|
2016-06-04 10:29:38 +08:00
|
|
|
f2fs_issue_discard(sbi, START_BLOCK(sbi, start_segno),
|
|
|
|
sbi->segs_per_sec << sbi->log_blocks_per_seg);
|
|
|
|
|
|
|
|
start = start_segno + sbi->segs_per_sec;
|
|
|
|
if (start < end)
|
|
|
|
goto next;
|
2017-02-28 03:57:11 +08:00
|
|
|
else
|
|
|
|
end = start - 1;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
2013-11-12 13:49:56 +08:00
|
|
|
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
if (!f2fs_block_unit_discard(sbi))
|
|
|
|
goto wakeup;
|
|
|
|
|
2013-11-12 13:49:56 +08:00
|
|
|
/* send small discards */
|
2014-03-29 11:33:17 +08:00
|
|
|
list_for_each_entry_safe(entry, this, head, list) {
|
2017-03-28 18:18:50 +08:00
|
|
|
unsigned int cur_pos = 0, next_pos, len, total_len = 0;
|
|
|
|
bool is_valid = test_bit_le(0, entry->discard_map);
|
|
|
|
|
|
|
|
find_next:
|
|
|
|
if (is_valid) {
|
|
|
|
next_pos = find_next_zero_bit_le(entry->discard_map,
|
|
|
|
sbi->blocks_per_seg, cur_pos);
|
|
|
|
len = next_pos - cur_pos;
|
|
|
|
|
2018-10-24 18:34:26 +08:00
|
|
|
if (f2fs_sb_has_blkzoned(sbi) ||
|
2023-06-14 04:35:31 +08:00
|
|
|
!force || len < cpc->trim_minlen)
|
2017-03-28 18:18:50 +08:00
|
|
|
goto skip;
|
|
|
|
|
|
|
|
f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
|
|
|
|
len);
|
|
|
|
total_len += len;
|
|
|
|
} else {
|
|
|
|
next_pos = find_next_bit_le(entry->discard_map,
|
|
|
|
sbi->blocks_per_seg, cur_pos);
|
|
|
|
}
|
2015-05-01 13:50:06 +08:00
|
|
|
skip:
|
2017-03-28 18:18:50 +08:00
|
|
|
cur_pos = next_pos;
|
|
|
|
is_valid = !is_valid;
|
|
|
|
|
|
|
|
if (cur_pos < sbi->blocks_per_seg)
|
|
|
|
goto find_next;
|
|
|
|
|
2018-04-25 17:38:29 +08:00
|
|
|
release_discard_addr(entry);
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
dcc->nr_discards -= total_len;
|
2013-11-12 13:49:56 +08:00
|
|
|
}
|
2017-04-25 00:21:34 +08:00
|
|
|
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
wakeup:
|
2017-08-23 12:15:43 +08:00
|
|
|
wake_up_discard_thread(sbi, false);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2021-08-19 16:02:37 +08:00
|
|
|
int f2fs_start_discard_thread(struct f2fs_sb_info *sbi)
|
2017-01-12 06:40:24 +08:00
|
|
|
{
|
2017-01-10 12:32:07 +08:00
|
|
|
dev_t dev = sbi->sb->s_bdev->bd_dev;
|
2021-08-19 16:02:37 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
if (!f2fs_realtime_discard_enable(sbi))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
dcc->f2fs_issue_discard = kthread_run(issue_discard_thread, sbi,
|
|
|
|
"f2fs_discard-%u:%u", MAJOR(dev), MINOR(dev));
|
2022-10-21 10:34:22 +08:00
|
|
|
if (IS_ERR(dcc->f2fs_issue_discard)) {
|
2021-08-19 16:02:37 +08:00
|
|
|
err = PTR_ERR(dcc->f2fs_issue_discard);
|
2022-10-21 10:34:22 +08:00
|
|
|
dcc->f2fs_issue_discard = NULL;
|
|
|
|
}
|
2021-08-19 16:02:37 +08:00
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2017-01-12 06:40:24 +08:00
|
|
|
struct discard_cmd_control *dcc;
|
2017-04-15 14:09:37 +08:00
|
|
|
int err = 0, i;
|
2017-01-12 06:40:24 +08:00
|
|
|
|
|
|
|
if (SM_I(sbi)->dcc_info) {
|
|
|
|
dcc = SM_I(sbi)->dcc_info;
|
|
|
|
goto init_thread;
|
|
|
|
}
|
|
|
|
|
2017-11-30 19:28:17 +08:00
|
|
|
dcc = f2fs_kzalloc(sbi, sizeof(struct discard_cmd_control), GFP_KERNEL);
|
2017-01-12 06:40:24 +08:00
|
|
|
if (!dcc)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2023-01-04 19:40:29 +08:00
|
|
|
dcc->discard_io_aware_gran = MAX_PLIST_NUM;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
dcc->discard_granularity = DEFAULT_DISCARD_GRANULARITY;
|
2022-10-25 16:32:26 +08:00
|
|
|
dcc->max_ordered_discard = DEFAULT_MAX_ORDERED_DISCARD_GRANULARITY;
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
if (F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_SEGMENT)
|
|
|
|
dcc->discard_granularity = sbi->blocks_per_seg;
|
|
|
|
else if (F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_SECTION)
|
|
|
|
dcc->discard_granularity = BLKS_PER_SEC(sbi);
|
|
|
|
|
2017-04-15 14:09:36 +08:00
|
|
|
INIT_LIST_HEAD(&dcc->entry_list);
|
2017-10-04 09:08:34 +08:00
|
|
|
for (i = 0; i < MAX_PLIST_NUM; i++)
|
2017-04-15 14:09:37 +08:00
|
|
|
INIT_LIST_HEAD(&dcc->pend_list[i]);
|
2017-04-15 14:09:36 +08:00
|
|
|
INIT_LIST_HEAD(&dcc->wait_list);
|
2017-10-04 09:08:32 +08:00
|
|
|
INIT_LIST_HEAD(&dcc->fstrim_list);
|
2017-01-10 12:32:07 +08:00
|
|
|
mutex_init(&dcc->cmd_lock);
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_set(&dcc->issued_discard, 0);
|
2018-12-14 08:53:57 +08:00
|
|
|
atomic_set(&dcc->queued_discard, 0);
|
2017-03-25 17:19:59 +08:00
|
|
|
atomic_set(&dcc->discard_cmd_cnt, 0);
|
2017-01-12 06:40:24 +08:00
|
|
|
dcc->nr_discards = 0;
|
2017-04-25 00:21:35 +08:00
|
|
|
dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
|
2021-12-14 09:12:03 +08:00
|
|
|
dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
|
|
|
|
dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
|
|
|
|
dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
|
|
|
|
dcc->max_discard_issue_time = DEF_MAX_DISCARD_ISSUE_TIME;
|
2022-11-24 00:44:02 +08:00
|
|
|
dcc->discard_urgent_util = DEF_DISCARD_URGENT_UTIL;
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks = 0;
|
2018-07-08 22:11:01 +08:00
|
|
|
dcc->next_pos = 0;
|
2018-10-04 11:18:30 +08:00
|
|
|
dcc->root = RB_ROOT_CACHED;
|
2018-06-22 16:06:59 +08:00
|
|
|
dcc->rbtree_check = false;
|
2017-01-12 06:40:24 +08:00
|
|
|
|
2017-01-10 12:32:07 +08:00
|
|
|
init_waitqueue_head(&dcc->discard_wait_queue);
|
2017-01-12 06:40:24 +08:00
|
|
|
SM_I(sbi)->dcc_info = dcc;
|
|
|
|
init_thread:
|
2021-08-19 16:02:37 +08:00
|
|
|
err = f2fs_start_discard_thread(sbi);
|
|
|
|
if (err) {
|
2020-09-14 16:47:00 +08:00
|
|
|
kfree(dcc);
|
2017-01-10 12:32:07 +08:00
|
|
|
SM_I(sbi)->dcc_info = NULL;
|
|
|
|
}
|
|
|
|
|
2017-01-12 06:40:24 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-03-27 18:14:04 +08:00
|
|
|
static void destroy_discard_cmd_control(struct f2fs_sb_info *sbi)
|
2017-01-12 06:40:24 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
|
2017-03-27 18:14:04 +08:00
|
|
|
if (!dcc)
|
|
|
|
return;
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_stop_discard_thread(sbi);
|
2017-03-27 18:14:04 +08:00
|
|
|
|
2019-07-19 15:18:44 +08:00
|
|
|
/*
|
|
|
|
* Recovery can cache discard commands, so in error path of
|
|
|
|
* fill_super(), it needs to give a chance to handle them.
|
|
|
|
*/
|
2022-12-02 12:58:41 +08:00
|
|
|
f2fs_issue_discard_timeout(sbi);
|
2019-07-19 15:18:44 +08:00
|
|
|
|
2020-09-14 16:47:00 +08:00
|
|
|
kfree(dcc);
|
2017-03-27 18:14:04 +08:00
|
|
|
SM_I(sbi)->dcc_info = NULL;
|
2017-01-12 06:40:24 +08:00
|
|
|
}
|
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
static bool __mark_sit_entry_dirty(struct f2fs_sb_info *sbi, unsigned int segno)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
|
|
|
if (!__test_and_set_bit(segno, sit_i->dirty_sentries_bitmap)) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
sit_i->dirty_sentries++;
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void __set_sit_entry_type(struct f2fs_sb_info *sbi, int type,
|
|
|
|
unsigned int segno, int modified)
|
|
|
|
{
|
|
|
|
struct seg_entry *se = get_seg_entry(sbi, segno);
|
2021-04-06 09:47:35 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
se->type = type;
|
|
|
|
if (modified)
|
|
|
|
__mark_sit_entry_dirty(sbi, segno);
|
|
|
|
}
|
|
|
|
|
2020-08-04 21:14:47 +08:00
|
|
|
static inline unsigned long long get_segment_mtime(struct f2fs_sb_info *sbi,
|
|
|
|
block_t blkaddr)
|
2020-08-04 21:14:46 +08:00
|
|
|
{
|
|
|
|
unsigned int segno = GET_SEGNO(sbi, blkaddr);
|
2020-08-04 21:14:47 +08:00
|
|
|
|
|
|
|
if (segno == NULL_SEGNO)
|
|
|
|
return 0;
|
|
|
|
return get_seg_entry(sbi, segno)->mtime;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void update_segment_mtime(struct f2fs_sb_info *sbi, block_t blkaddr,
|
|
|
|
unsigned long long old_mtime)
|
|
|
|
{
|
|
|
|
struct seg_entry *se;
|
|
|
|
unsigned int segno = GET_SEGNO(sbi, blkaddr);
|
|
|
|
unsigned long long ctime = get_mtime(sbi, false);
|
|
|
|
unsigned long long mtime = old_mtime ? old_mtime : ctime;
|
|
|
|
|
|
|
|
if (segno == NULL_SEGNO)
|
|
|
|
return;
|
|
|
|
|
|
|
|
se = get_seg_entry(sbi, segno);
|
2020-08-04 21:14:46 +08:00
|
|
|
|
|
|
|
if (!se->mtime)
|
|
|
|
se->mtime = mtime;
|
|
|
|
else
|
|
|
|
se->mtime = div_u64(se->mtime * se->valid_blocks + mtime,
|
|
|
|
se->valid_blocks + 1);
|
|
|
|
|
2020-08-04 21:14:47 +08:00
|
|
|
if (ctime > SIT_I(sbi)->max_mtime)
|
|
|
|
SIT_I(sbi)->max_mtime = ctime;
|
2020-08-04 21:14:46 +08:00
|
|
|
}
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
|
|
|
|
{
|
|
|
|
struct seg_entry *se;
|
|
|
|
unsigned int segno, offset;
|
|
|
|
long int new_vblocks;
|
2017-08-02 21:20:13 +08:00
|
|
|
bool exist;
|
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
bool mir_exist;
|
|
|
|
#endif
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
segno = GET_SEGNO(sbi, blkaddr);
|
|
|
|
|
|
|
|
se = get_seg_entry(sbi, segno);
|
|
|
|
new_vblocks = se->valid_blocks + del;
|
2014-02-04 12:01:10 +08:00
|
|
|
offset = GET_BLKOFF_FROM_SEG0(sbi, blkaddr);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2020-08-01 11:24:50 +08:00
|
|
|
f2fs_bug_on(sbi, (new_vblocks < 0 ||
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
(new_vblocks > f2fs_usable_blks_in_seg(sbi, segno))));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
se->valid_blocks = new_vblocks;
|
|
|
|
|
|
|
|
/* Update valid block bitmap */
|
|
|
|
if (del > 0) {
|
2017-08-02 21:20:13 +08:00
|
|
|
exist = f2fs_test_and_set_bit(offset, se->cur_valid_map);
|
2017-01-07 18:51:01 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
2017-08-02 21:20:13 +08:00
|
|
|
mir_exist = f2fs_test_and_set_bit(offset,
|
|
|
|
se->cur_valid_map_mir);
|
|
|
|
if (unlikely(exist != mir_exist)) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_err(sbi, "Inconsistent error when setting bitmap, blk:%u, old bit:%d",
|
|
|
|
blkaddr, exist);
|
2014-09-03 07:05:00 +08:00
|
|
|
f2fs_bug_on(sbi, 1);
|
2017-08-02 21:20:13 +08:00
|
|
|
}
|
2017-01-07 18:51:01 +08:00
|
|
|
#endif
|
2017-08-02 21:20:13 +08:00
|
|
|
if (unlikely(exist)) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_err(sbi, "Bitmap was wrongly set, blk:%u",
|
|
|
|
blkaddr);
|
2017-08-02 21:20:13 +08:00
|
|
|
f2fs_bug_on(sbi, 1);
|
2017-08-02 22:16:54 +08:00
|
|
|
se->valid_blocks--;
|
|
|
|
del = 0;
|
2017-01-07 18:51:01 +08:00
|
|
|
}
|
2017-08-02 21:20:13 +08:00
|
|
|
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
if (f2fs_block_unit_discard(sbi) &&
|
|
|
|
!f2fs_test_and_set_bit(offset, se->discard_map))
|
2015-05-01 13:37:50 +08:00
|
|
|
sbi->discard_blks--;
|
2017-03-07 03:59:56 +08:00
|
|
|
|
2019-08-16 11:03:34 +08:00
|
|
|
/*
|
|
|
|
* SSR should never reuse block which is checkpointed
|
|
|
|
* or newly invalidated.
|
|
|
|
*/
|
|
|
|
if (!is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
|
2017-03-07 03:59:56 +08:00
|
|
|
if (!f2fs_test_and_set_bit(offset, se->ckpt_valid_map))
|
|
|
|
se->ckpt_valid_blocks++;
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
} else {
|
2017-08-02 21:20:13 +08:00
|
|
|
exist = f2fs_test_and_clear_bit(offset, se->cur_valid_map);
|
2017-01-07 18:51:01 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
2017-08-02 21:20:13 +08:00
|
|
|
mir_exist = f2fs_test_and_clear_bit(offset,
|
|
|
|
se->cur_valid_map_mir);
|
|
|
|
if (unlikely(exist != mir_exist)) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_err(sbi, "Inconsistent error when clearing bitmap, blk:%u, old bit:%d",
|
|
|
|
blkaddr, exist);
|
2014-09-03 07:05:00 +08:00
|
|
|
f2fs_bug_on(sbi, 1);
|
2017-08-02 21:20:13 +08:00
|
|
|
}
|
2017-01-07 18:51:01 +08:00
|
|
|
#endif
|
2017-08-02 21:20:13 +08:00
|
|
|
if (unlikely(!exist)) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_err(sbi, "Bitmap was wrongly cleared, blk:%u",
|
|
|
|
blkaddr);
|
2017-08-02 21:20:13 +08:00
|
|
|
f2fs_bug_on(sbi, 1);
|
2017-08-02 22:16:54 +08:00
|
|
|
se->valid_blocks++;
|
|
|
|
del = 0;
|
2018-08-21 10:21:43 +08:00
|
|
|
} else if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
|
|
|
|
/*
|
|
|
|
* If checkpoints are off, we must not reuse data that
|
|
|
|
* was used in the previous checkpoint. If it was used
|
|
|
|
* before, we must track that to know how much space we
|
|
|
|
* really have.
|
|
|
|
*/
|
2019-05-05 11:40:46 +08:00
|
|
|
if (f2fs_test_bit(offset, se->ckpt_valid_map)) {
|
|
|
|
spin_lock(&sbi->stat_lock);
|
2018-08-21 10:21:43 +08:00
|
|
|
sbi->unusable_block_count++;
|
2019-05-05 11:40:46 +08:00
|
|
|
spin_unlock(&sbi->stat_lock);
|
|
|
|
}
|
2017-01-07 18:51:01 +08:00
|
|
|
}
|
2017-08-02 21:20:13 +08:00
|
|
|
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
if (f2fs_block_unit_discard(sbi) &&
|
|
|
|
f2fs_test_and_clear_bit(offset, se->discard_map))
|
2015-05-01 13:37:50 +08:00
|
|
|
sbi->discard_blks++;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
if (!f2fs_test_bit(offset, se->ckpt_valid_map))
|
|
|
|
se->ckpt_valid_blocks += del;
|
|
|
|
|
|
|
|
__mark_sit_entry_dirty(sbi, segno);
|
|
|
|
|
|
|
|
/* update total number of valid blocks to be written in ckpt area */
|
|
|
|
SIT_I(sbi)->written_valid_blocks += del;
|
|
|
|
|
2018-10-24 18:37:26 +08:00
|
|
|
if (__is_large_section(sbi))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
get_sec_entry(sbi, segno)->valid_blocks += del;
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
unsigned int segno = GET_SEGNO(sbi, addr);
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
|
2014-09-03 06:52:58 +08:00
|
|
|
f2fs_bug_on(sbi, addr == NULL_ADDR);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
if (addr == NEW_ADDR || addr == COMPRESS_ADDR)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return;
|
|
|
|
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
invalidate_mapping_pages(META_MAPPING(sbi), addr, addr);
|
2021-05-20 19:51:50 +08:00
|
|
|
f2fs_invalidate_compress_page(sbi, addr);
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/* add it into sit main buffer */
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2020-08-04 21:14:47 +08:00
|
|
|
update_segment_mtime(sbi, addr, 0);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
update_sit_entry(sbi, addr, -1);
|
|
|
|
|
|
|
|
/* add it into dirty seglist */
|
|
|
|
locate_dirty_segment(sbi, segno);
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
bool f2fs_is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr)
|
2015-10-08 03:28:41 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
unsigned int segno, offset;
|
|
|
|
struct seg_entry *se;
|
|
|
|
bool is_cp = false;
|
|
|
|
|
f2fs: introduce DATA_GENERIC_ENHANCE
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
whether @blkaddr locates in main area or not.
That check is weak, since the block address in range of main area can
point to the address which is not valid in segment info table, and we
can not detect such condition, we may suffer worse corruption as system
continues running.
So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
which trigger SIT bitmap check rather than only range check.
This patch did below changes as wel:
- set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
- get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
- introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
- spread blkaddr check in:
* f2fs_get_node_info()
* __read_out_blkaddrs()
* f2fs_submit_page_read()
* ra_data_block()
* do_recover_data()
This patch can fix bug reported from bugzilla below:
https://bugzilla.kernel.org/show_bug.cgi?id=203215
https://bugzilla.kernel.org/show_bug.cgi?id=203223
https://bugzilla.kernel.org/show_bug.cgi?id=203231
https://bugzilla.kernel.org/show_bug.cgi?id=203235
https://bugzilla.kernel.org/show_bug.cgi?id=203241
= Update by Jaegeuk Kim =
DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
But, xfstest/generic/446 compalins some generated kernel messages saying invalid
bitmap was detected when reading a block. The reaons is, when we get the
block addresses from extent_cache, there is no lock to synchronize it from
truncating the blocks in parallel.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-15 15:26:32 +08:00
|
|
|
if (!__is_valid_data_blkaddr(blkaddr))
|
2015-10-08 03:28:41 +08:00
|
|
|
return true;
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_read(&sit_i->sentry_lock);
|
2015-10-08 03:28:41 +08:00
|
|
|
|
|
|
|
segno = GET_SEGNO(sbi, blkaddr);
|
|
|
|
se = get_seg_entry(sbi, segno);
|
|
|
|
offset = GET_BLKOFF_FROM_SEG0(sbi, blkaddr);
|
|
|
|
|
|
|
|
if (f2fs_test_bit(offset, se->ckpt_valid_map))
|
|
|
|
is_cp = true;
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
up_read(&sit_i->sentry_lock);
|
2015-10-08 03:28:41 +08:00
|
|
|
|
|
|
|
return is_cp;
|
|
|
|
}
|
|
|
|
|
2023-01-19 14:36:20 +08:00
|
|
|
static unsigned short f2fs_curseg_valid_blocks(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
|
|
|
|
if (sbi->ckpt->alloc_type[type] == SSR)
|
|
|
|
return sbi->blocks_per_seg;
|
|
|
|
return curseg->next_blkoff;
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Calculate the number of current summary pages for writing
|
|
|
|
*/
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
int valid_sum_count = 0;
|
2013-10-29 16:21:47 +08:00
|
|
|
int i, sum_in_page;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
|
2023-01-19 14:36:20 +08:00
|
|
|
if (sbi->ckpt->alloc_type[i] != SSR && for_ra)
|
|
|
|
valid_sum_count +=
|
|
|
|
le16_to_cpu(F2FS_CKPT(sbi)->cur_data_blkoff[i]);
|
|
|
|
else
|
|
|
|
valid_sum_count += f2fs_curseg_valid_blocks(sbi, i);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
sum_in_page = (PAGE_SIZE - 2 * SUM_JOURNAL_SIZE -
|
2013-10-29 16:21:47 +08:00
|
|
|
SUM_FOOTER_SIZE) / SUMMARY_SIZE;
|
|
|
|
if (valid_sum_count <= sum_in_page)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 1;
|
2013-10-29 16:21:47 +08:00
|
|
|
else if ((valid_sum_count - sum_in_page) <=
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
(PAGE_SIZE - SUM_FOOTER_SIZE) / SUMMARY_SIZE)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 2;
|
|
|
|
return 3;
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Caller should put this summary page
|
|
|
|
*/
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
struct page *f2fs_get_sum_page(struct f2fs_sb_info *sbi, unsigned int segno)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2020-10-03 05:17:35 +08:00
|
|
|
if (unlikely(f2fs_cp_error(sbi)))
|
|
|
|
return ERR_PTR(-EIO);
|
|
|
|
return f2fs_get_meta_page_retry(sbi, GET_SUM_BLOCK(sbi, segno));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_update_meta_page(struct f2fs_sb_info *sbi,
|
|
|
|
void *src, block_t blk_addr)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
struct page *page = f2fs_grab_meta_page(sbi, blk_addr);
|
2015-05-19 17:40:04 +08:00
|
|
|
|
2017-11-02 20:41:02 +08:00
|
|
|
memcpy(page_address(page), src, PAGE_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
set_page_dirty(page);
|
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
}
|
|
|
|
|
2015-05-19 17:40:04 +08:00
|
|
|
static void write_sum_page(struct f2fs_sb_info *sbi,
|
|
|
|
struct f2fs_summary_block *sum_blk, block_t blk_addr)
|
|
|
|
{
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_update_meta_page(sbi, (void *)sum_blk, blk_addr);
|
2015-05-19 17:40:04 +08:00
|
|
|
}
|
|
|
|
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
static void write_current_sum_page(struct f2fs_sb_info *sbi,
|
|
|
|
int type, block_t blk_addr)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
struct page *page = f2fs_grab_meta_page(sbi, blk_addr);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
struct f2fs_summary_block *src = curseg->sum_blk;
|
|
|
|
struct f2fs_summary_block *dst;
|
|
|
|
|
|
|
|
dst = (struct f2fs_summary_block *)page_address(page);
|
2018-04-09 20:25:06 +08:00
|
|
|
memset(dst, 0, PAGE_SIZE);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
|
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
|
|
|
|
|
|
|
down_read(&curseg->journal_rwsem);
|
|
|
|
memcpy(&dst->journal, curseg->journal, SUM_JOURNAL_SIZE);
|
|
|
|
up_read(&curseg->journal_rwsem);
|
|
|
|
|
|
|
|
memcpy(dst->entries, src->entries, SUM_ENTRY_SIZE);
|
|
|
|
memcpy(&dst->footer, &src->footer, SUM_FOOTER_SIZE);
|
|
|
|
|
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
|
|
|
|
|
|
|
set_page_dirty(page);
|
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
}
|
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
static int is_next_segment_free(struct f2fs_sb_info *sbi,
|
|
|
|
struct curseg_info *curseg, int type)
|
2017-04-21 04:51:57 +08:00
|
|
|
{
|
|
|
|
unsigned int segno = curseg->segno + 1;
|
|
|
|
struct free_segmap_info *free_i = FREE_I(sbi);
|
|
|
|
|
|
|
|
if (segno < MAIN_SEGS(sbi) && segno % sbi->segs_per_sec)
|
|
|
|
return !test_bit(segno, free_i->free_segmap);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Find a new segment from the free segments bitmap to right order
|
|
|
|
* This function should be returned with success, otherwise BUG
|
|
|
|
*/
|
|
|
|
static void get_new_segment(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int *newseg, bool new_sec, int dir)
|
|
|
|
{
|
|
|
|
struct free_segmap_info *free_i = FREE_I(sbi);
|
|
|
|
unsigned int segno, secno, zoneno;
|
2014-09-24 02:23:01 +08:00
|
|
|
unsigned int total_zones = MAIN_SECS(sbi) / sbi->secs_per_zone;
|
2017-04-08 06:08:17 +08:00
|
|
|
unsigned int hint = GET_SEC_FROM_SEG(sbi, *newseg);
|
|
|
|
unsigned int old_zoneno = GET_ZONE_FROM_SEG(sbi, *newseg);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
unsigned int left_start = hint;
|
|
|
|
bool init = true;
|
|
|
|
int go_left = 0;
|
|
|
|
int i;
|
|
|
|
|
2015-02-11 18:20:38 +08:00
|
|
|
spin_lock(&free_i->segmap_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
if (!new_sec && ((*newseg + 1) % sbi->segs_per_sec)) {
|
|
|
|
segno = find_next_zero_bit(free_i->free_segmap,
|
2017-04-08 06:08:17 +08:00
|
|
|
GET_SEG_FROM_SEC(sbi, hint + 1), *newseg + 1);
|
|
|
|
if (segno < GET_SEG_FROM_SEC(sbi, hint + 1))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
goto got_it;
|
|
|
|
}
|
|
|
|
find_other_zone:
|
2014-09-24 02:23:01 +08:00
|
|
|
secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint);
|
|
|
|
if (secno >= MAIN_SECS(sbi)) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (dir == ALLOC_RIGHT) {
|
2021-08-15 05:17:03 +08:00
|
|
|
secno = find_first_zero_bit(free_i->free_secmap,
|
|
|
|
MAIN_SECS(sbi));
|
2014-09-24 02:23:01 +08:00
|
|
|
f2fs_bug_on(sbi, secno >= MAIN_SECS(sbi));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
} else {
|
|
|
|
go_left = 1;
|
|
|
|
left_start = hint - 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (go_left == 0)
|
|
|
|
goto skip_left;
|
|
|
|
|
|
|
|
while (test_bit(left_start, free_i->free_secmap)) {
|
|
|
|
if (left_start > 0) {
|
|
|
|
left_start--;
|
|
|
|
continue;
|
|
|
|
}
|
2021-08-15 05:17:03 +08:00
|
|
|
left_start = find_first_zero_bit(free_i->free_secmap,
|
|
|
|
MAIN_SECS(sbi));
|
2014-09-24 02:23:01 +08:00
|
|
|
f2fs_bug_on(sbi, left_start >= MAIN_SECS(sbi));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
secno = left_start;
|
|
|
|
skip_left:
|
2017-04-08 06:08:17 +08:00
|
|
|
segno = GET_SEG_FROM_SEC(sbi, secno);
|
|
|
|
zoneno = GET_ZONE_FROM_SEC(sbi, secno);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
/* give up on finding another zone */
|
|
|
|
if (!init)
|
|
|
|
goto got_it;
|
|
|
|
if (sbi->secs_per_zone == 1)
|
|
|
|
goto got_it;
|
|
|
|
if (zoneno == old_zoneno)
|
|
|
|
goto got_it;
|
|
|
|
if (dir == ALLOC_LEFT) {
|
|
|
|
if (!go_left && zoneno + 1 >= total_zones)
|
|
|
|
goto got_it;
|
|
|
|
if (go_left && zoneno == 0)
|
|
|
|
goto got_it;
|
|
|
|
}
|
|
|
|
for (i = 0; i < NR_CURSEG_TYPE; i++)
|
|
|
|
if (CURSEG_I(sbi, i)->zone == zoneno)
|
|
|
|
break;
|
|
|
|
|
|
|
|
if (i < NR_CURSEG_TYPE) {
|
|
|
|
/* zone is in user, try another */
|
|
|
|
if (go_left)
|
|
|
|
hint = zoneno * sbi->secs_per_zone - 1;
|
|
|
|
else if (zoneno + 1 >= total_zones)
|
|
|
|
hint = 0;
|
|
|
|
else
|
|
|
|
hint = (zoneno + 1) * sbi->secs_per_zone;
|
|
|
|
init = false;
|
|
|
|
goto find_other_zone;
|
|
|
|
}
|
|
|
|
got_it:
|
|
|
|
/* set it as dirty segment in free segmap */
|
2014-09-03 06:52:58 +08:00
|
|
|
f2fs_bug_on(sbi, test_bit(segno, free_i->free_segmap));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
__set_inuse(sbi, segno);
|
|
|
|
*newseg = segno;
|
2015-02-11 18:20:38 +08:00
|
|
|
spin_unlock(&free_i->segmap_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void reset_curseg(struct f2fs_sb_info *sbi, int type, int modified)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
struct summary_footer *sum_footer;
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
unsigned short seg_type = curseg->seg_type;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
curseg->inited = true;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
curseg->segno = curseg->next_segno;
|
2017-04-08 06:08:17 +08:00
|
|
|
curseg->zone = GET_ZONE_FROM_SEG(sbi, curseg->segno);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
curseg->next_blkoff = 0;
|
|
|
|
curseg->next_segno = NULL_SEGNO;
|
|
|
|
|
|
|
|
sum_footer = &(curseg->sum_blk->footer);
|
|
|
|
memset(sum_footer, 0, sizeof(struct summary_footer));
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
|
|
|
|
sanity_check_seg_type(sbi, seg_type);
|
|
|
|
|
|
|
|
if (IS_DATASEG(seg_type))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
SET_SUM_TYPE(sum_footer, SUM_TYPE_DATA);
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
if (IS_NODESEG(seg_type))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
SET_SUM_TYPE(sum_footer, SUM_TYPE_NODE);
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
__set_sit_entry_type(sbi, seg_type, curseg->segno, modified);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2017-03-25 08:41:45 +08:00
|
|
|
static unsigned int __get_next_segno(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
unsigned short seg_type = curseg->seg_type;
|
|
|
|
|
|
|
|
sanity_check_seg_type(sbi, seg_type);
|
2021-09-30 02:12:03 +08:00
|
|
|
if (f2fs_need_rand_seg(sbi))
|
2022-10-10 10:44:02 +08:00
|
|
|
return get_random_u32_below(MAIN_SECS(sbi) * sbi->segs_per_sec);
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
|
2017-04-21 04:51:57 +08:00
|
|
|
/* if segs_per_sec is large than 1, we need to keep original policy. */
|
2018-10-24 18:37:26 +08:00
|
|
|
if (__is_large_section(sbi))
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
return curseg->segno;
|
|
|
|
|
|
|
|
/* inmem log may not locate on any segment after mount */
|
|
|
|
if (!curseg->inited)
|
|
|
|
return 0;
|
2017-04-21 04:51:57 +08:00
|
|
|
|
2018-08-21 10:21:43 +08:00
|
|
|
if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
|
|
|
|
return 0;
|
|
|
|
|
2018-01-29 11:37:45 +08:00
|
|
|
if (test_opt(sbi, NOHEAP) &&
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
(seg_type == CURSEG_HOT_DATA || IS_NODESEG(seg_type)))
|
2017-03-25 08:41:45 +08:00
|
|
|
return 0;
|
|
|
|
|
2017-04-14 06:17:00 +08:00
|
|
|
if (SIT_I(sbi)->last_victim[ALLOC_NEXT])
|
|
|
|
return SIT_I(sbi)->last_victim[ALLOC_NEXT];
|
2018-02-19 00:50:49 +08:00
|
|
|
|
|
|
|
/* find segments from 0 to reuse freed segments */
|
2018-03-08 14:22:56 +08:00
|
|
|
if (F2FS_OPTION(sbi).alloc_mode == ALLOC_MODE_REUSE)
|
2018-02-19 00:50:49 +08:00
|
|
|
return 0;
|
|
|
|
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
return curseg->segno;
|
2017-03-25 08:41:45 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Allocate a current working segment.
|
|
|
|
* This function always allocates a free segment in LFS manner.
|
|
|
|
*/
|
|
|
|
static void new_curseg(struct f2fs_sb_info *sbi, int type, bool new_sec)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
unsigned short seg_type = curseg->seg_type;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
unsigned int segno = curseg->segno;
|
|
|
|
int dir = ALLOC_LEFT;
|
|
|
|
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
if (curseg->inited)
|
|
|
|
write_sum_page(sbi, curseg->sum_blk,
|
2013-05-14 18:20:28 +08:00
|
|
|
GET_SUM_BLOCK(sbi, segno));
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
if (seg_type == CURSEG_WARM_DATA || seg_type == CURSEG_COLD_DATA)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
dir = ALLOC_RIGHT;
|
|
|
|
|
|
|
|
if (test_opt(sbi, NOHEAP))
|
|
|
|
dir = ALLOC_RIGHT;
|
|
|
|
|
2017-03-25 08:41:45 +08:00
|
|
|
segno = __get_next_segno(sbi, type);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
get_new_segment(sbi, &segno, new_sec, dir);
|
|
|
|
curseg->next_segno = segno;
|
|
|
|
reset_curseg(sbi, type, 1);
|
|
|
|
curseg->alloc_type = LFS;
|
2021-09-30 02:12:03 +08:00
|
|
|
if (F2FS_OPTION(sbi).fs_mode == FS_MODE_FRAGMENT_BLK)
|
|
|
|
curseg->fragment_remained_chunk =
|
2022-10-10 10:44:02 +08:00
|
|
|
get_random_u32_inclusive(1, sbi->max_fragment_chunk);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2021-04-13 17:56:18 +08:00
|
|
|
static int __next_free_blkoff(struct f2fs_sb_info *sbi,
|
|
|
|
int segno, block_t start)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2021-04-13 17:56:18 +08:00
|
|
|
struct seg_entry *se = get_seg_entry(sbi, segno);
|
2013-11-15 12:21:16 +08:00
|
|
|
int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
|
2015-02-11 08:44:29 +08:00
|
|
|
unsigned long *target_map = SIT_I(sbi)->tmp_map;
|
2013-11-15 12:21:16 +08:00
|
|
|
unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
|
|
|
|
unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
|
2021-04-13 17:56:18 +08:00
|
|
|
int i;
|
2013-11-15 12:21:16 +08:00
|
|
|
|
|
|
|
for (i = 0; i < entries; i++)
|
|
|
|
target_map[i] = ckpt_map[i] | cur_map[i];
|
|
|
|
|
2021-04-13 17:56:18 +08:00
|
|
|
return __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2023-01-19 14:36:24 +08:00
|
|
|
static int f2fs_find_next_ssr_block(struct f2fs_sb_info *sbi,
|
|
|
|
struct curseg_info *seg)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2023-01-19 14:36:24 +08:00
|
|
|
return __next_free_blkoff(sbi, seg->segno, seg->next_blkoff + 1);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: fix to avoid touching checkpointed data in get_victim()
In CP disabling mode, there are two issues when using LFS or SSR | AT_SSR
mode to select victim:
1. LFS is set to find source section during GC, the victim should have
no checkpointed data, since after GC, section could not be set free for
reuse.
Previously, we only check valid chpt blocks in current segment rather
than section, fix it.
2. SSR | AT_SSR are set to find target segment for writes which can be
fully filled by checkpointed and newly written blocks, we should never
select such segment, otherwise it can cause panic or data corruption
during allocation, potential case is described as below:
a) target segment has 'n' (n < 512) ckpt valid blocks
b) GC migrates 'n' valid blocks to other segment (segment is still
in dirty list)
c) GC migrates '512 - n' blocks to target segment (segment has 'n'
cp_vblocks and '512 - n' vblocks)
d) If GC selects target segment via {AT,}SSR allocator, however there
is no free space in targe segment.
Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-03-24 11:18:28 +08:00
|
|
|
bool f2fs_segment_has_free_slot(struct f2fs_sb_info *sbi, int segno)
|
|
|
|
{
|
2021-04-13 17:56:18 +08:00
|
|
|
return __next_free_blkoff(sbi, segno, 0) < sbi->blocks_per_seg;
|
f2fs: fix to avoid touching checkpointed data in get_victim()
In CP disabling mode, there are two issues when using LFS or SSR | AT_SSR
mode to select victim:
1. LFS is set to find source section during GC, the victim should have
no checkpointed data, since after GC, section could not be set free for
reuse.
Previously, we only check valid chpt blocks in current segment rather
than section, fix it.
2. SSR | AT_SSR are set to find target segment for writes which can be
fully filled by checkpointed and newly written blocks, we should never
select such segment, otherwise it can cause panic or data corruption
during allocation, potential case is described as below:
a) target segment has 'n' (n < 512) ckpt valid blocks
b) GC migrates 'n' valid blocks to other segment (segment is still
in dirty list)
c) GC migrates '512 - n' blocks to target segment (segment has 'n'
cp_vblocks and '512 - n' vblocks)
d) If GC selects target segment via {AT,}SSR allocator, however there
is no free space in targe segment.
Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-03-24 11:18:28 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
2014-08-06 22:22:50 +08:00
|
|
|
* This function always allocates a used segment(from dirty seglist) by SSR
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* manner, so it should recover the existing segment information of valid blocks
|
|
|
|
*/
|
2022-11-28 17:43:46 +08:00
|
|
|
static void change_curseg(struct f2fs_sb_info *sbi, int type)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
unsigned int new_segno = curseg->next_segno;
|
|
|
|
struct f2fs_summary_block *sum_node;
|
|
|
|
struct page *sum_page;
|
|
|
|
|
2022-11-28 17:43:46 +08:00
|
|
|
write_sum_page(sbi, curseg->sum_blk, GET_SUM_BLOCK(sbi, curseg->segno));
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
__set_test_and_inuse(sbi, new_segno);
|
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
|
|
|
__remove_dirty_segment(sbi, new_segno, PRE);
|
|
|
|
__remove_dirty_segment(sbi, new_segno, DIRTY);
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
|
|
|
|
reset_curseg(sbi, type, 1);
|
|
|
|
curseg->alloc_type = SSR;
|
2021-04-13 17:56:18 +08:00
|
|
|
curseg->next_blkoff = __next_free_blkoff(sbi, curseg->segno, 0);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
sum_page = f2fs_get_sum_page(sbi, new_segno);
|
2020-10-03 05:17:35 +08:00
|
|
|
if (IS_ERR(sum_page)) {
|
|
|
|
/* GC won't be able to use stale summary pages by cp_error */
|
|
|
|
memset(curseg->sum_blk, 0, SUM_ENTRY_SIZE);
|
|
|
|
return;
|
|
|
|
}
|
2017-08-30 18:04:48 +08:00
|
|
|
sum_node = (struct f2fs_summary_block *)page_address(sum_page);
|
|
|
|
memcpy(curseg->sum_blk, sum_node, SUM_ENTRY_SIZE);
|
|
|
|
f2fs_put_page(sum_page, 1);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
static int get_ssr_segment(struct f2fs_sb_info *sbi, int type,
|
|
|
|
int alloc_mode, unsigned long long age);
|
|
|
|
|
|
|
|
static void get_atssr_segment(struct f2fs_sb_info *sbi, int type,
|
|
|
|
int target_type, int alloc_mode,
|
|
|
|
unsigned long long age)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
|
|
|
|
curseg->seg_type = target_type;
|
|
|
|
|
|
|
|
if (get_ssr_segment(sbi, type, alloc_mode, age)) {
|
|
|
|
struct seg_entry *se = get_seg_entry(sbi, curseg->next_segno);
|
|
|
|
|
|
|
|
curseg->seg_type = se->type;
|
2022-11-28 17:43:46 +08:00
|
|
|
change_curseg(sbi, type);
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
} else {
|
|
|
|
/* allocate cold segment by default */
|
|
|
|
curseg->seg_type = CURSEG_COLD_DATA;
|
|
|
|
new_curseg(sbi, type, true);
|
|
|
|
}
|
|
|
|
stat_inc_seg_type(sbi, curseg);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __f2fs_init_atgc_curseg(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_ALL_DATA_ATGC);
|
|
|
|
|
|
|
|
if (!sbi->am.atgc_enabled)
|
|
|
|
return;
|
|
|
|
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_down_read(&SM_I(sbi)->curseg_lock);
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
|
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
|
|
|
down_write(&SIT_I(sbi)->sentry_lock);
|
|
|
|
|
|
|
|
get_atssr_segment(sbi, CURSEG_ALL_DATA_ATGC, CURSEG_COLD_DATA, SSR, 0);
|
|
|
|
|
|
|
|
up_write(&SIT_I(sbi)->sentry_lock);
|
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
|
|
|
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_up_read(&SM_I(sbi)->curseg_lock);
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
|
|
|
|
}
|
|
|
|
void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
__f2fs_init_atgc_curseg(sbi);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi, int type)
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
|
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
|
|
|
if (!curseg->inited)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (get_valid_blocks(sbi, curseg->segno, false)) {
|
|
|
|
write_sum_page(sbi, curseg->sum_blk,
|
|
|
|
GET_SUM_BLOCK(sbi, curseg->segno));
|
|
|
|
} else {
|
|
|
|
mutex_lock(&DIRTY_I(sbi)->seglist_lock);
|
|
|
|
__set_test_and_free(sbi, curseg->segno, true);
|
|
|
|
mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
|
|
|
|
}
|
|
|
|
out:
|
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
|
|
|
}
|
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
__f2fs_save_inmem_curseg(sbi, CURSEG_COLD_DATA_PINNED);
|
|
|
|
|
|
|
|
if (sbi->am.atgc_enabled)
|
|
|
|
__f2fs_save_inmem_curseg(sbi, CURSEG_ALL_DATA_ATGC);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi, int type)
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
|
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
|
|
|
if (!curseg->inited)
|
|
|
|
goto out;
|
|
|
|
if (get_valid_blocks(sbi, curseg->segno, false))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
mutex_lock(&DIRTY_I(sbi)->seglist_lock);
|
|
|
|
__set_test_and_inuse(sbi, curseg->segno);
|
|
|
|
mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
|
|
|
|
out:
|
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
|
|
|
}
|
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
__f2fs_restore_inmem_curseg(sbi, CURSEG_COLD_DATA_PINNED);
|
|
|
|
|
|
|
|
if (sbi->am.atgc_enabled)
|
|
|
|
__f2fs_restore_inmem_curseg(sbi, CURSEG_ALL_DATA_ATGC);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int get_ssr_segment(struct f2fs_sb_info *sbi, int type,
|
|
|
|
int alloc_mode, unsigned long long age)
|
2013-02-04 14:11:17 +08:00
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
2017-04-14 06:17:00 +08:00
|
|
|
unsigned segno = NULL_SEGNO;
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
unsigned short seg_type = curseg->seg_type;
|
2017-02-24 18:46:00 +08:00
|
|
|
int i, cnt;
|
|
|
|
bool reversed = false;
|
2017-02-23 09:10:18 +08:00
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
sanity_check_seg_type(sbi, seg_type);
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
/* f2fs_need_SSR() already forces to do this */
|
2023-04-04 12:00:51 +08:00
|
|
|
if (!f2fs_get_victim(sbi, &segno, BG_GC, seg_type, alloc_mode, age)) {
|
2017-04-14 06:17:00 +08:00
|
|
|
curseg->next_segno = segno;
|
2017-02-23 09:10:18 +08:00
|
|
|
return 1;
|
2017-04-14 06:17:00 +08:00
|
|
|
}
|
2013-02-04 14:11:17 +08:00
|
|
|
|
2017-02-23 09:02:32 +08:00
|
|
|
/* For node segments, let's do SSR more intensively */
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
if (IS_NODESEG(seg_type)) {
|
|
|
|
if (seg_type >= CURSEG_WARM_NODE) {
|
2017-02-24 18:46:00 +08:00
|
|
|
reversed = true;
|
|
|
|
i = CURSEG_COLD_NODE;
|
|
|
|
} else {
|
|
|
|
i = CURSEG_HOT_NODE;
|
|
|
|
}
|
|
|
|
cnt = NR_CURSEG_NODE_TYPE;
|
2017-02-23 09:02:32 +08:00
|
|
|
} else {
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
if (seg_type >= CURSEG_WARM_DATA) {
|
2017-02-24 18:46:00 +08:00
|
|
|
reversed = true;
|
|
|
|
i = CURSEG_COLD_DATA;
|
|
|
|
} else {
|
|
|
|
i = CURSEG_HOT_DATA;
|
|
|
|
}
|
|
|
|
cnt = NR_CURSEG_DATA_TYPE;
|
2017-02-23 09:02:32 +08:00
|
|
|
}
|
2013-02-04 14:11:17 +08:00
|
|
|
|
2017-02-24 18:46:00 +08:00
|
|
|
for (; cnt-- > 0; reversed ? i-- : i++) {
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
if (i == seg_type)
|
2017-02-23 09:10:18 +08:00
|
|
|
continue;
|
2023-04-04 12:00:51 +08:00
|
|
|
if (!f2fs_get_victim(sbi, &segno, BG_GC, i, alloc_mode, age)) {
|
2017-04-14 06:17:00 +08:00
|
|
|
curseg->next_segno = segno;
|
2013-02-04 14:11:17 +08:00
|
|
|
return 1;
|
2017-04-14 06:17:00 +08:00
|
|
|
}
|
2017-02-23 09:10:18 +08:00
|
|
|
}
|
2018-08-21 10:21:43 +08:00
|
|
|
|
|
|
|
/* find valid_blocks=0 in dirty list */
|
|
|
|
if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
|
|
|
|
segno = get_free_segment(sbi);
|
|
|
|
if (segno != NULL_SEGNO) {
|
|
|
|
curseg->next_segno = segno;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
}
|
2013-02-04 14:11:17 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-11-28 17:43:45 +08:00
|
|
|
static bool need_new_seg(struct f2fs_sb_info *sbi, int type)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-04-21 04:51:57 +08:00
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
|
2022-11-28 17:43:45 +08:00
|
|
|
if (!is_set_ckpt_flags(sbi, CP_CRC_RECOVERY_FLAG) &&
|
|
|
|
curseg->seg_type == CURSEG_WARM_NODE)
|
|
|
|
return true;
|
|
|
|
if (curseg->alloc_type == LFS &&
|
|
|
|
is_next_segment_free(sbi, curseg, type) &&
|
|
|
|
likely(!is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
|
|
|
|
return true;
|
|
|
|
if (!f2fs_need_SSR(sbi) || !get_ssr_segment(sbi, type, SSR, 0))
|
|
|
|
return true;
|
|
|
|
return false;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2020-06-18 14:36:22 +08:00
|
|
|
void f2fs_allocate_segment_for_resize(struct f2fs_sb_info *sbi, int type,
|
2019-06-05 11:33:25 +08:00
|
|
|
unsigned int start, unsigned int end)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
unsigned int segno;
|
|
|
|
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_down_read(&SM_I(sbi)->curseg_lock);
|
2019-06-05 11:33:25 +08:00
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
|
|
|
down_write(&SIT_I(sbi)->sentry_lock);
|
|
|
|
|
|
|
|
segno = CURSEG_I(sbi, type)->segno;
|
|
|
|
if (segno < start || segno > end)
|
|
|
|
goto unlock;
|
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
if (f2fs_need_SSR(sbi) && get_ssr_segment(sbi, type, SSR, 0))
|
2022-11-28 17:43:46 +08:00
|
|
|
change_curseg(sbi, type);
|
2019-06-05 11:33:25 +08:00
|
|
|
else
|
|
|
|
new_curseg(sbi, type, true);
|
|
|
|
|
|
|
|
stat_inc_seg_type(sbi, curseg);
|
|
|
|
|
|
|
|
locate_dirty_segment(sbi, segno);
|
|
|
|
unlock:
|
|
|
|
up_write(&SIT_I(sbi)->sentry_lock);
|
|
|
|
|
|
|
|
if (segno != curseg->segno)
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_notice(sbi, "For resize: curseg of type %d: %u ==> %u",
|
|
|
|
type, segno, curseg->segno);
|
2019-06-05 11:33:25 +08:00
|
|
|
|
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_up_read(&SM_I(sbi)->curseg_lock);
|
2019-06-05 11:33:25 +08:00
|
|
|
}
|
|
|
|
|
2021-03-05 17:56:01 +08:00
|
|
|
static void __allocate_new_segment(struct f2fs_sb_info *sbi, int type,
|
2021-04-21 09:54:55 +08:00
|
|
|
bool new_sec, bool force)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2020-06-22 17:38:48 +08:00
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
2016-11-12 04:31:40 +08:00
|
|
|
unsigned int old_segno;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2023-01-19 14:36:22 +08:00
|
|
|
if (!force && curseg->inited &&
|
|
|
|
!curseg->next_blkoff &&
|
|
|
|
!get_valid_blocks(sbi, curseg->segno, new_sec) &&
|
|
|
|
!get_ckpt_valid_blocks(sbi, curseg->segno, new_sec))
|
f2fs: fix to avoid touching checkpointed data in get_victim()
In CP disabling mode, there are two issues when using LFS or SSR | AT_SSR
mode to select victim:
1. LFS is set to find source section during GC, the victim should have
no checkpointed data, since after GC, section could not be set free for
reuse.
Previously, we only check valid chpt blocks in current segment rather
than section, fix it.
2. SSR | AT_SSR are set to find target segment for writes which can be
fully filled by checkpointed and newly written blocks, we should never
select such segment, otherwise it can cause panic or data corruption
during allocation, potential case is described as below:
a) target segment has 'n' (n < 512) ckpt valid blocks
b) GC migrates 'n' valid blocks to other segment (segment is still
in dirty list)
c) GC migrates '512 - n' blocks to target segment (segment has 'n'
cp_vblocks and '512 - n' vblocks)
d) If GC selects target segment via {AT,}SSR allocator, however there
is no free space in targe segment.
Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-03-24 11:18:28 +08:00
|
|
|
return;
|
2023-01-19 14:36:22 +08:00
|
|
|
|
2020-06-22 17:38:48 +08:00
|
|
|
old_segno = curseg->segno;
|
2022-11-28 17:43:45 +08:00
|
|
|
new_curseg(sbi, type, true);
|
|
|
|
stat_inc_seg_type(sbi, curseg);
|
2020-06-22 17:38:48 +08:00
|
|
|
locate_dirty_segment(sbi, old_segno);
|
|
|
|
}
|
2019-10-19 01:06:40 +08:00
|
|
|
|
2021-04-21 09:54:55 +08:00
|
|
|
void f2fs_allocate_new_section(struct f2fs_sb_info *sbi, int type, bool force)
|
2020-06-22 17:38:48 +08:00
|
|
|
{
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_down_read(&SM_I(sbi)->curseg_lock);
|
2020-06-22 17:38:48 +08:00
|
|
|
down_write(&SIT_I(sbi)->sentry_lock);
|
2023-01-19 14:36:23 +08:00
|
|
|
__allocate_new_segment(sbi, type, true, force);
|
2020-06-22 17:38:48 +08:00
|
|
|
up_write(&SIT_I(sbi)->sentry_lock);
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_up_read(&SM_I(sbi)->curseg_lock);
|
2020-06-22 17:38:48 +08:00
|
|
|
}
|
2017-10-30 17:49:53 +08:00
|
|
|
|
2020-06-22 17:38:48 +08:00
|
|
|
void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_down_read(&SM_I(sbi)->curseg_lock);
|
2020-06-22 17:38:48 +08:00
|
|
|
down_write(&SIT_I(sbi)->sentry_lock);
|
|
|
|
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++)
|
2021-04-21 09:54:55 +08:00
|
|
|
__allocate_new_segment(sbi, i, false, false);
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&SIT_I(sbi)->sentry_lock);
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_up_read(&SM_I(sbi)->curseg_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
bool f2fs_exist_trim_candidates(struct f2fs_sb_info *sbi,
|
|
|
|
struct cp_control *cpc)
|
2016-12-30 14:06:15 +08:00
|
|
|
{
|
|
|
|
__u64 trim_start = cpc->trim_start;
|
|
|
|
bool has_candidate = false;
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&SIT_I(sbi)->sentry_lock);
|
2016-12-30 14:06:15 +08:00
|
|
|
for (; cpc->trim_start <= cpc->trim_end; cpc->trim_start++) {
|
|
|
|
if (add_discard_addrs(sbi, cpc, true)) {
|
|
|
|
has_candidate = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&SIT_I(sbi)->sentry_lock);
|
2016-12-30 14:06:15 +08:00
|
|
|
|
|
|
|
cpc->trim_start = trim_start;
|
|
|
|
return has_candidate;
|
|
|
|
}
|
|
|
|
|
2018-06-25 20:33:24 +08:00
|
|
|
static unsigned int __issue_discard_cmd_range(struct f2fs_sb_info *sbi,
|
2018-05-25 04:57:26 +08:00
|
|
|
struct discard_policy *dpolicy,
|
|
|
|
unsigned int start, unsigned int end)
|
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct discard_cmd *prev_dc = NULL, *next_dc = NULL;
|
|
|
|
struct rb_node **insert_p = NULL, *insert_parent = NULL;
|
|
|
|
struct discard_cmd *dc;
|
|
|
|
struct blk_plug plug;
|
|
|
|
int issued;
|
2018-06-25 20:33:24 +08:00
|
|
|
unsigned int trimmed = 0;
|
2018-05-25 04:57:26 +08:00
|
|
|
|
|
|
|
next:
|
|
|
|
issued = 0;
|
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
2018-06-22 16:06:59 +08:00
|
|
|
if (unlikely(dcc->rbtree_check))
|
2023-03-11 03:12:35 +08:00
|
|
|
f2fs_bug_on(sbi, !f2fs_check_discard_tree(sbi));
|
|
|
|
|
|
|
|
dc = __lookup_discard_cmd_ret(&dcc->root, start,
|
|
|
|
&prev_dc, &next_dc, &insert_p, &insert_parent);
|
2018-05-25 04:57:26 +08:00
|
|
|
if (!dc)
|
|
|
|
dc = next_dc;
|
|
|
|
|
|
|
|
blk_start_plug(&plug);
|
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
while (dc && dc->di.lstart <= end) {
|
2018-05-25 04:57:26 +08:00
|
|
|
struct rb_node *node;
|
2018-08-08 10:14:55 +08:00
|
|
|
int err = 0;
|
2018-05-25 04:57:26 +08:00
|
|
|
|
2023-03-11 03:12:35 +08:00
|
|
|
if (dc->di.len < dpolicy->granularity)
|
2018-05-25 04:57:26 +08:00
|
|
|
goto skip;
|
|
|
|
|
|
|
|
if (dc->state != D_PREP) {
|
|
|
|
list_move_tail(&dc->list, &dcc->fstrim_list);
|
|
|
|
goto skip;
|
|
|
|
}
|
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
err = __submit_discard_cmd(sbi, dpolicy, dc, &issued);
|
2018-05-25 04:57:26 +08:00
|
|
|
|
2018-08-06 22:43:50 +08:00
|
|
|
if (issued >= dpolicy->max_requests) {
|
2023-03-11 03:12:35 +08:00
|
|
|
start = dc->di.lstart + dc->di.len;
|
2018-05-25 04:57:26 +08:00
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
if (err)
|
|
|
|
__remove_discard_cmd(sbi, dc);
|
|
|
|
|
2018-05-25 04:57:26 +08:00
|
|
|
blk_finish_plug(&plug);
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2018-06-25 20:33:24 +08:00
|
|
|
trimmed += __wait_all_discard_cmd(sbi, NULL);
|
2022-03-23 05:39:13 +08:00
|
|
|
f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
|
2018-05-25 04:57:26 +08:00
|
|
|
goto next;
|
|
|
|
}
|
|
|
|
skip:
|
|
|
|
node = rb_next(&dc->rb_node);
|
2018-08-08 10:14:55 +08:00
|
|
|
if (err)
|
|
|
|
__remove_discard_cmd(sbi, dc);
|
2018-05-25 04:57:26 +08:00
|
|
|
dc = rb_entry_safe(node, struct discard_cmd, rb_node);
|
|
|
|
|
|
|
|
if (fatal_signal_pending(current))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
blk_finish_plug(&plug);
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2018-06-25 20:33:24 +08:00
|
|
|
|
|
|
|
return trimmed;
|
2018-05-25 04:57:26 +08:00
|
|
|
}
|
|
|
|
|
2014-09-21 13:06:39 +08:00
|
|
|
int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range)
|
|
|
|
{
|
2015-02-10 04:02:44 +08:00
|
|
|
__u64 start = F2FS_BYTES_TO_BLK(range->start);
|
|
|
|
__u64 end = start + F2FS_BYTES_TO_BLK(range->len) - 1;
|
2018-04-09 10:25:23 +08:00
|
|
|
unsigned int start_segno, end_segno;
|
2017-10-04 09:08:32 +08:00
|
|
|
block_t start_block, end_block;
|
2014-09-21 13:06:39 +08:00
|
|
|
struct cp_control cpc;
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy dpolicy;
|
2017-10-28 16:52:32 +08:00
|
|
|
unsigned long long trimmed = 0;
|
2015-12-23 17:50:30 +08:00
|
|
|
int err = 0;
|
2020-02-14 17:44:12 +08:00
|
|
|
bool need_align = f2fs_lfs_mode(sbi) && __is_large_section(sbi);
|
2014-09-21 13:06:39 +08:00
|
|
|
|
2015-05-01 13:50:06 +08:00
|
|
|
if (start >= MAX_BLKADDR(sbi) || range->len < sbi->blocksize)
|
2014-09-21 13:06:39 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2018-08-08 17:36:29 +08:00
|
|
|
if (end < MAIN_BLKADDR(sbi))
|
|
|
|
goto out;
|
2014-09-21 13:06:39 +08:00
|
|
|
|
2016-09-01 10:14:39 +08:00
|
|
|
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK)) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_warn(sbi, "Found FS corruption, run fsck to fix.");
|
2019-06-20 11:36:14 +08:00
|
|
|
return -EFSCORRUPTED;
|
2016-09-01 10:14:39 +08:00
|
|
|
}
|
|
|
|
|
2014-09-21 13:06:39 +08:00
|
|
|
/* start/end segment number in main_area */
|
2014-09-24 02:23:01 +08:00
|
|
|
start_segno = (start <= MAIN_BLKADDR(sbi)) ? 0 : GET_SEGNO(sbi, start);
|
|
|
|
end_segno = (end >= MAX_BLKADDR(sbi)) ? MAIN_SEGS(sbi) - 1 :
|
|
|
|
GET_SEGNO(sbi, end);
|
f2fs: issue discard align to section in LFS mode
For the case when sbi->segs_per_sec > 1 with lfs mode, take
section:segment = 5 for example, if the section prefree_map is
...previous section | current section (1 1 0 1 1) | next section...,
then the start = x, end = x + 1, after start = start_segno +
sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but
their bitmap is still set, which will cause duplicated
f2fs_issue_discard of this same section in the next write_checkpoint:
round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0
then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0
write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0,
prefree of NO.2 is cleared, and no discard issued
round 2: rm data block NO.0, NO.1, NO.3, NO.4
all invalid, but prefree bit of NO.2 is set and cleared in round 1, then
prefree_map: 1 1 0 1 1
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no
valid blocks of this section, so discard issued, but this time prefree
bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec;
round 3:
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 ->
0 0 0 0 0, no valid blocks of this section, so discard issued,
this time prefree bit of NO.3 and NO.4 is cleared, but the discard of
this section is sent again...
To fix this problem, we can align the start and end value to section
boundary for fstrim and real-time discard operation, and decide to issue
discard only when the whole section is invalid, which can issue discard
aligned to section size as much as possible and avoid redundant discard.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-19 20:58:15 +08:00
|
|
|
if (need_align) {
|
|
|
|
start_segno = rounddown(start_segno, sbi->segs_per_sec);
|
|
|
|
end_segno = roundup(end_segno + 1, sbi->segs_per_sec) - 1;
|
|
|
|
}
|
2017-10-04 09:08:32 +08:00
|
|
|
|
2014-09-21 13:06:39 +08:00
|
|
|
cpc.reason = CP_DISCARD;
|
2015-05-01 13:50:06 +08:00
|
|
|
cpc.trim_minlen = max_t(__u64, 1, F2FS_BYTES_TO_BLK(range->minlen));
|
2018-04-09 10:25:23 +08:00
|
|
|
cpc.trim_start = start_segno;
|
|
|
|
cpc.trim_end = end_segno;
|
2014-09-21 13:06:39 +08:00
|
|
|
|
2018-04-09 10:25:23 +08:00
|
|
|
if (sbi->discard_blks == 0)
|
|
|
|
goto out;
|
2016-08-21 23:21:30 +08:00
|
|
|
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_down_write(&sbi->gc_lock);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
err = f2fs_write_checkpoint(sbi, &cpc);
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_up_write(&sbi->gc_lock);
|
2018-04-09 10:25:23 +08:00
|
|
|
if (err)
|
|
|
|
goto out;
|
2017-10-04 09:08:32 +08:00
|
|
|
|
2018-06-01 01:20:48 +08:00
|
|
|
/*
|
|
|
|
* We filed discard candidates, but actually we don't need to wait for
|
|
|
|
* all of them, since they'll be issued in idle time along with runtime
|
|
|
|
* discard option. User configuration looks like using runtime discard
|
|
|
|
* or periodic fstrim instead of it.
|
|
|
|
*/
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
if (f2fs_realtime_discard_enable(sbi))
|
2018-06-21 12:27:21 +08:00
|
|
|
goto out;
|
|
|
|
|
|
|
|
start_block = START_BLOCK(sbi, start_segno);
|
|
|
|
end_block = START_BLOCK(sbi, end_segno + 1);
|
|
|
|
|
|
|
|
__init_discard_policy(sbi, &dpolicy, DPOLICY_FSTRIM, cpc.trim_minlen);
|
2018-06-25 20:33:24 +08:00
|
|
|
trimmed = __issue_discard_cmd_range(sbi, &dpolicy,
|
|
|
|
start_block, end_block);
|
2018-06-21 12:27:21 +08:00
|
|
|
|
2018-06-25 20:33:24 +08:00
|
|
|
trimmed += __wait_discard_cmd_range(sbi, &dpolicy,
|
2017-10-28 16:52:32 +08:00
|
|
|
start_block, end_block);
|
2018-04-09 10:25:23 +08:00
|
|
|
out:
|
2018-08-05 23:09:00 +08:00
|
|
|
if (!err)
|
|
|
|
range->len = F2FS_BLK_TO_BYTES(trimmed);
|
2015-12-23 17:50:30 +08:00
|
|
|
return err;
|
2014-09-21 13:06:39 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_rw_hint_to_seg_type(enum rw_hint hint)
|
2017-11-09 13:51:27 +08:00
|
|
|
{
|
|
|
|
switch (hint) {
|
|
|
|
case WRITE_LIFE_SHORT:
|
|
|
|
return CURSEG_HOT_DATA;
|
|
|
|
case WRITE_LIFE_EXTREME:
|
|
|
|
return CURSEG_COLD_DATA;
|
|
|
|
default:
|
|
|
|
return CURSEG_WARM_DATA;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-05-11 05:19:54 +08:00
|
|
|
static int __get_segment_type_2(struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-05-11 05:19:54 +08:00
|
|
|
if (fio->type == DATA)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return CURSEG_HOT_DATA;
|
|
|
|
else
|
|
|
|
return CURSEG_HOT_NODE;
|
|
|
|
}
|
|
|
|
|
2017-05-11 05:19:54 +08:00
|
|
|
static int __get_segment_type_4(struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-05-11 05:19:54 +08:00
|
|
|
if (fio->type == DATA) {
|
|
|
|
struct inode *inode = fio->page->mapping->host;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
if (S_ISDIR(inode->i_mode))
|
|
|
|
return CURSEG_HOT_DATA;
|
|
|
|
else
|
|
|
|
return CURSEG_COLD_DATA;
|
|
|
|
} else {
|
2017-05-11 05:19:54 +08:00
|
|
|
if (IS_DNODE(fio->page) && is_cold_node(fio->page))
|
2014-11-06 12:05:53 +08:00
|
|
|
return CURSEG_WARM_NODE;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
else
|
|
|
|
return CURSEG_COLD_NODE;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-12-02 09:37:15 +08:00
|
|
|
static int __get_age_segment_type(struct inode *inode, pgoff_t pgofs)
|
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
2022-12-17 06:05:44 +08:00
|
|
|
struct extent_info ei = {};
|
2022-12-02 09:37:15 +08:00
|
|
|
|
|
|
|
if (f2fs_lookup_age_extent_cache(inode, pgofs, &ei)) {
|
|
|
|
if (!ei.age)
|
|
|
|
return NO_CHECK_TYPE;
|
|
|
|
if (ei.age <= sbi->hot_data_age_threshold)
|
|
|
|
return CURSEG_HOT_DATA;
|
|
|
|
if (ei.age <= sbi->warm_data_age_threshold)
|
|
|
|
return CURSEG_WARM_DATA;
|
|
|
|
return CURSEG_COLD_DATA;
|
|
|
|
}
|
|
|
|
return NO_CHECK_TYPE;
|
|
|
|
}
|
|
|
|
|
2017-05-11 05:19:54 +08:00
|
|
|
static int __get_segment_type_6(struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-05-11 05:19:54 +08:00
|
|
|
if (fio->type == DATA) {
|
|
|
|
struct inode *inode = fio->page->mapping->host;
|
2022-12-02 09:37:15 +08:00
|
|
|
int type;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2021-05-26 14:29:27 +08:00
|
|
|
if (is_inode_flag_set(inode, FI_ALIGNED_WRITE))
|
|
|
|
return CURSEG_COLD_DATA_PINNED;
|
|
|
|
|
2021-04-28 17:20:31 +08:00
|
|
|
if (page_private_gcing(fio->page)) {
|
2021-03-17 17:27:23 +08:00
|
|
|
if (fio->sbi->am.atgc_enabled &&
|
|
|
|
(fio->io_type == FS_DATA_IO) &&
|
|
|
|
(fio->sbi->gc_mode != GC_URGENT_HIGH))
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
return CURSEG_ALL_DATA_ATGC;
|
|
|
|
else
|
|
|
|
return CURSEG_COLD_DATA;
|
|
|
|
}
|
2020-12-01 12:08:02 +08:00
|
|
|
if (file_is_cold(inode) || f2fs_need_compress_data(inode))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return CURSEG_COLD_DATA;
|
2022-12-02 09:37:15 +08:00
|
|
|
|
|
|
|
type = __get_age_segment_type(inode, fio->page->index);
|
|
|
|
if (type != NO_CHECK_TYPE)
|
|
|
|
return type;
|
|
|
|
|
2018-02-28 17:07:27 +08:00
|
|
|
if (file_is_hot(inode) ||
|
2018-04-26 17:05:50 +08:00
|
|
|
is_inode_flag_set(inode, FI_HOT_DATA) ||
|
2022-08-01 19:26:04 +08:00
|
|
|
f2fs_is_cow_file(inode))
|
2017-03-25 08:05:13 +08:00
|
|
|
return CURSEG_HOT_DATA;
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
return f2fs_rw_hint_to_seg_type(inode->i_write_hint);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
} else {
|
2017-05-11 05:19:54 +08:00
|
|
|
if (IS_DNODE(fio->page))
|
|
|
|
return is_cold_node(fio->page) ? CURSEG_WARM_NODE :
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
CURSEG_HOT_NODE;
|
2017-03-25 08:05:13 +08:00
|
|
|
return CURSEG_COLD_NODE;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-05-11 05:19:54 +08:00
|
|
|
static int __get_segment_type(struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-05-11 02:18:25 +08:00
|
|
|
int type = 0;
|
|
|
|
|
2018-03-08 14:22:56 +08:00
|
|
|
switch (F2FS_OPTION(fio->sbi).active_logs) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
case 2:
|
2017-05-11 02:18:25 +08:00
|
|
|
type = __get_segment_type_2(fio);
|
|
|
|
break;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
case 4:
|
2017-05-11 02:18:25 +08:00
|
|
|
type = __get_segment_type_4(fio);
|
|
|
|
break;
|
|
|
|
case 6:
|
|
|
|
type = __get_segment_type_6(fio);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
f2fs_bug_on(fio->sbi, true);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
2017-05-11 05:19:54 +08:00
|
|
|
|
2017-05-11 02:18:25 +08:00
|
|
|
if (IS_HOT(type))
|
|
|
|
fio->temp = HOT;
|
|
|
|
else if (IS_WARM(type))
|
|
|
|
fio->temp = WARM;
|
|
|
|
else
|
|
|
|
fio->temp = COLD;
|
|
|
|
return type;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2023-01-19 14:36:24 +08:00
|
|
|
static void f2fs_randomize_chunk(struct f2fs_sb_info *sbi,
|
|
|
|
struct curseg_info *seg)
|
|
|
|
{
|
|
|
|
/* To allocate block chunks in different sizes, use random number */
|
|
|
|
if (--seg->fragment_remained_chunk > 0)
|
|
|
|
return;
|
|
|
|
|
|
|
|
seg->fragment_remained_chunk =
|
|
|
|
get_random_u32_inclusive(1, sbi->max_fragment_chunk);
|
|
|
|
seg->next_blkoff +=
|
|
|
|
get_random_u32_inclusive(1, sbi->max_fragment_hole);
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
|
2013-12-16 18:04:05 +08:00
|
|
|
block_t old_blkaddr, block_t *new_blkaddr,
|
2017-05-19 23:37:01 +08:00
|
|
|
struct f2fs_summary *sum, int type,
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
2016-11-12 04:31:40 +08:00
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
2020-08-04 21:14:47 +08:00
|
|
|
unsigned long long old_mtime;
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
bool from_gc = (type == CURSEG_ALL_DATA_ATGC);
|
|
|
|
struct seg_entry *se = NULL;
|
2023-01-19 14:36:25 +08:00
|
|
|
bool segment_full = false;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_down_read(&SM_I(sbi)->curseg_lock);
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
if (from_gc) {
|
|
|
|
f2fs_bug_on(sbi, GET_SEGNO(sbi, old_blkaddr) == NULL_SEGNO);
|
|
|
|
se = get_seg_entry(sbi, GET_SEGNO(sbi, old_blkaddr));
|
|
|
|
sanity_check_seg_type(sbi, se->type);
|
|
|
|
f2fs_bug_on(sbi, IS_NODESEG(se->type));
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
*new_blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
|
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
f2fs_bug_on(sbi, curseg->next_blkoff >= sbi->blocks_per_seg);
|
|
|
|
|
2016-12-30 06:07:53 +08:00
|
|
|
f2fs_wait_discard_bio(sbi, *new_blkaddr);
|
|
|
|
|
2023-01-19 14:36:18 +08:00
|
|
|
curseg->sum_blk->entries[curseg->next_blkoff] = *sum;
|
2023-01-19 14:36:24 +08:00
|
|
|
if (curseg->alloc_type == SSR) {
|
|
|
|
curseg->next_blkoff = f2fs_find_next_ssr_block(sbi, curseg);
|
|
|
|
} else {
|
|
|
|
curseg->next_blkoff++;
|
|
|
|
if (F2FS_OPTION(sbi).fs_mode == FS_MODE_FRAGMENT_BLK)
|
|
|
|
f2fs_randomize_chunk(sbi, curseg);
|
|
|
|
}
|
2023-01-19 14:36:25 +08:00
|
|
|
if (curseg->next_blkoff >= f2fs_usable_blks_in_seg(sbi, curseg->segno))
|
|
|
|
segment_full = true;
|
2013-10-22 19:56:10 +08:00
|
|
|
stat_inc_block_count(sbi, curseg);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2020-08-04 21:14:47 +08:00
|
|
|
if (from_gc) {
|
|
|
|
old_mtime = get_segment_mtime(sbi, old_blkaddr);
|
|
|
|
} else {
|
|
|
|
update_segment_mtime(sbi, old_blkaddr, 0);
|
|
|
|
old_mtime = 0;
|
|
|
|
}
|
|
|
|
update_segment_mtime(sbi, *new_blkaddr, old_mtime);
|
|
|
|
|
2017-10-30 09:33:41 +08:00
|
|
|
/*
|
|
|
|
* SIT information should be updated before segment allocation,
|
|
|
|
* since SSR needs latest valid block information.
|
|
|
|
*/
|
|
|
|
update_sit_entry(sbi, *new_blkaddr, 1);
|
|
|
|
if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO)
|
|
|
|
update_sit_entry(sbi, old_blkaddr, -1);
|
|
|
|
|
2023-01-19 14:36:25 +08:00
|
|
|
/*
|
|
|
|
* If the current segment is full, flush it out and replace it with a
|
|
|
|
* new segment.
|
|
|
|
*/
|
|
|
|
if (segment_full) {
|
2022-11-28 17:43:45 +08:00
|
|
|
if (from_gc) {
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
get_atssr_segment(sbi, type, se->type,
|
|
|
|
AT_SSR, se->mtime);
|
2022-11-28 17:43:45 +08:00
|
|
|
} else {
|
|
|
|
if (need_new_seg(sbi, type))
|
|
|
|
new_curseg(sbi, type, false);
|
|
|
|
else
|
2022-11-28 17:43:46 +08:00
|
|
|
change_curseg(sbi, type);
|
2022-11-28 17:43:45 +08:00
|
|
|
stat_inc_seg_type(sbi, curseg);
|
|
|
|
}
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/*
|
2017-10-30 09:33:41 +08:00
|
|
|
* segment dirty status should be updated after segment allocation,
|
|
|
|
* so we just need to update status only one time after previous
|
|
|
|
* segment being closed.
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
*/
|
2017-10-30 09:33:41 +08:00
|
|
|
locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
|
|
|
|
locate_dirty_segment(sbi, GET_SEGNO(sbi, *new_blkaddr));
|
2014-01-28 11:22:14 +08:00
|
|
|
|
2022-12-02 09:37:15 +08:00
|
|
|
if (IS_DATASEG(type))
|
|
|
|
atomic64_inc(&sbi->allocated_data_blocks);
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2017-07-31 20:19:09 +08:00
|
|
|
if (page && IS_NODESEG(type)) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
fill_node_footer_blkaddr(page, NEXT_FREE_BLKADDR(sbi, curseg));
|
|
|
|
|
2017-07-31 20:19:09 +08:00
|
|
|
f2fs_inode_chksum_set(sbi, page);
|
|
|
|
}
|
|
|
|
|
2020-06-18 14:36:24 +08:00
|
|
|
if (fio) {
|
2017-05-19 23:37:01 +08:00
|
|
|
struct f2fs_bio_info *io;
|
|
|
|
|
2021-04-02 17:22:23 +08:00
|
|
|
if (F2FS_IO_ALIGNED(sbi))
|
2023-02-02 15:04:56 +08:00
|
|
|
fio->retry = 0;
|
2021-04-02 17:22:23 +08:00
|
|
|
|
2017-05-19 23:37:01 +08:00
|
|
|
INIT_LIST_HEAD(&fio->list);
|
2023-02-02 15:04:56 +08:00
|
|
|
fio->in_list = 1;
|
2017-05-19 23:37:01 +08:00
|
|
|
io = sbi->write_io[fio->type] + fio->temp;
|
|
|
|
spin_lock(&io->io_lock);
|
|
|
|
list_add_tail(&fio->list, &io->io_list);
|
|
|
|
spin_unlock(&io->io_lock);
|
|
|
|
}
|
|
|
|
|
2013-12-16 18:04:05 +08:00
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_up_read(&SM_I(sbi)->curseg_lock);
|
2013-12-16 18:04:05 +08:00
|
|
|
}
|
|
|
|
|
2021-09-01 14:39:20 +08:00
|
|
|
void f2fs_update_device_state(struct f2fs_sb_info *sbi, nid_t ino,
|
|
|
|
block_t blkaddr, unsigned int blkcnt)
|
2017-09-29 13:59:38 +08:00
|
|
|
{
|
2019-03-16 08:13:06 +08:00
|
|
|
if (!f2fs_is_multi_device(sbi))
|
2017-09-29 13:59:38 +08:00
|
|
|
return;
|
|
|
|
|
2021-09-01 14:39:20 +08:00
|
|
|
while (1) {
|
|
|
|
unsigned int devidx = f2fs_target_device_index(sbi, blkaddr);
|
|
|
|
unsigned int blks = FDEV(devidx).end_blk - blkaddr + 1;
|
2017-09-29 13:59:38 +08:00
|
|
|
|
2021-09-01 14:39:20 +08:00
|
|
|
/* update device state for fsync */
|
|
|
|
f2fs_set_dirty_device(sbi, ino, devidx, FLUSH_INO);
|
2017-09-29 13:59:39 +08:00
|
|
|
|
2021-09-01 14:39:20 +08:00
|
|
|
/* update device state for checkpoint */
|
|
|
|
if (!f2fs_test_bit(devidx, (char *)&sbi->dirty_device)) {
|
|
|
|
spin_lock(&sbi->dev_lock);
|
|
|
|
f2fs_set_bit(devidx, (char *)&sbi->dirty_device);
|
|
|
|
spin_unlock(&sbi->dev_lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (blkcnt <= blks)
|
|
|
|
break;
|
|
|
|
blkcnt -= blks;
|
|
|
|
blkaddr += blks;
|
2017-09-29 13:59:39 +08:00
|
|
|
}
|
2017-09-29 13:59:38 +08:00
|
|
|
}
|
|
|
|
|
2015-04-24 05:38:15 +08:00
|
|
|
static void do_write_page(struct f2fs_summary *sum, struct f2fs_io_info *fio)
|
2013-12-16 18:04:05 +08:00
|
|
|
{
|
2017-05-11 05:19:54 +08:00
|
|
|
int type = __get_segment_type(fio);
|
2020-02-14 17:44:12 +08:00
|
|
|
bool keep_order = (f2fs_lfs_mode(fio->sbi) && type == CURSEG_COLD_DATA);
|
2013-12-16 18:04:05 +08:00
|
|
|
|
2018-05-26 09:00:13 +08:00
|
|
|
if (keep_order)
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_down_read(&fio->sbi->io_order_lock);
|
2016-12-15 02:12:56 +08:00
|
|
|
reallocate:
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_allocate_data_block(fio->sbi, fio->page, fio->old_blkaddr,
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
&fio->new_blkaddr, sum, type, fio);
|
2021-05-20 19:51:50 +08:00
|
|
|
if (GET_SEGNO(fio->sbi, fio->old_blkaddr) != NULL_SEGNO) {
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
invalidate_mapping_pages(META_MAPPING(fio->sbi),
|
|
|
|
fio->old_blkaddr, fio->old_blkaddr);
|
2021-05-20 19:51:50 +08:00
|
|
|
f2fs_invalidate_compress_page(fio->sbi, fio->old_blkaddr);
|
|
|
|
}
|
2013-12-16 18:04:05 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/* writeout dirty page into bdev */
|
2018-05-28 23:47:18 +08:00
|
|
|
f2fs_submit_page_write(fio);
|
|
|
|
if (fio->retry) {
|
2016-12-15 02:12:56 +08:00
|
|
|
fio->old_blkaddr = fio->new_blkaddr;
|
|
|
|
goto reallocate;
|
|
|
|
}
|
2018-05-28 23:47:18 +08:00
|
|
|
|
2021-09-01 14:39:20 +08:00
|
|
|
f2fs_update_device_state(fio->sbi, fio->ino, fio->new_blkaddr, 1);
|
2018-05-28 23:47:18 +08:00
|
|
|
|
2018-05-26 09:00:13 +08:00
|
|
|
if (keep_order)
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_up_read(&fio->sbi->io_order_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_do_write_meta_page(struct f2fs_sb_info *sbi, struct page *page,
|
2017-08-02 23:21:48 +08:00
|
|
|
enum iostat_type io_type)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2013-12-11 12:54:01 +08:00
|
|
|
struct f2fs_io_info fio = {
|
2015-04-24 05:38:15 +08:00
|
|
|
.sbi = sbi,
|
2013-12-11 12:54:01 +08:00
|
|
|
.type = META,
|
2018-01-31 10:36:57 +08:00
|
|
|
.temp = HOT,
|
2016-06-06 03:31:55 +08:00
|
|
|
.op = REQ_OP_WRITE,
|
2016-11-01 21:40:10 +08:00
|
|
|
.op_flags = REQ_SYNC | REQ_META | REQ_PRIO,
|
f2fs: trace old block address for CoWed page
This patch enables to trace old block address of CoWed page for better
debugging.
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 18:36:38 +08:00
|
|
|
.old_blkaddr = page->index,
|
|
|
|
.new_blkaddr = page->index,
|
2015-04-24 05:38:15 +08:00
|
|
|
.page = page,
|
2015-04-24 03:04:33 +08:00
|
|
|
.encrypted_page = NULL,
|
2023-02-02 15:04:56 +08:00
|
|
|
.in_list = 0,
|
2013-12-11 12:54:01 +08:00
|
|
|
};
|
|
|
|
|
2015-10-12 17:04:21 +08:00
|
|
|
if (unlikely(page->index >= MAIN_BLKADDR(sbi)))
|
2016-06-06 03:31:55 +08:00
|
|
|
fio.op_flags &= ~REQ_META;
|
2015-10-12 17:04:21 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
set_page_writeback(page);
|
2017-05-11 02:28:38 +08:00
|
|
|
f2fs_submit_page_write(&fio);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
2018-09-29 18:31:27 +08:00
|
|
|
stat_inc_meta_count(sbi, page->index);
|
2022-08-20 11:04:41 +08:00
|
|
|
f2fs_update_iostat(sbi, NULL, io_type, F2FS_BLKSIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_do_write_node_page(unsigned int nid, struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct f2fs_summary sum;
|
2015-04-24 05:38:15 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
set_summary(&sum, nid, 0, 0);
|
2015-04-24 05:38:15 +08:00
|
|
|
do_write_page(&sum, fio);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
2022-08-20 11:04:41 +08:00
|
|
|
f2fs_update_iostat(fio->sbi, NULL, fio->io_type, F2FS_BLKSIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_outplace_write_data(struct dnode_of_data *dn,
|
|
|
|
struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2015-04-24 05:38:15 +08:00
|
|
|
struct f2fs_sb_info *sbi = fio->sbi;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
struct f2fs_summary sum;
|
|
|
|
|
2014-09-03 06:52:58 +08:00
|
|
|
f2fs_bug_on(sbi, dn->data_blkaddr == NULL_ADDR);
|
2022-12-02 09:37:15 +08:00
|
|
|
if (fio->io_type == FS_DATA_IO || fio->io_type == FS_CP_DATA_IO)
|
|
|
|
f2fs_update_age_extent_cache(dn);
|
2018-07-17 00:02:17 +08:00
|
|
|
set_summary(&sum, dn->nid, dn->ofs_in_node, fio->version);
|
2015-04-24 05:38:15 +08:00
|
|
|
do_write_page(&sum, fio);
|
2016-02-24 17:16:47 +08:00
|
|
|
f2fs_update_data_blkaddr(dn, fio->new_blkaddr);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
2022-08-20 11:04:41 +08:00
|
|
|
f2fs_update_iostat(sbi, dn->inode, fio->io_type, F2FS_BLKSIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_inplace_write_data(struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-08-02 23:21:48 +08:00
|
|
|
int err;
|
2018-03-26 17:32:23 +08:00
|
|
|
struct f2fs_sb_info *sbi = fio->sbi;
|
2019-04-15 15:30:52 +08:00
|
|
|
unsigned int segno;
|
2017-08-02 23:21:48 +08:00
|
|
|
|
f2fs: trace old block address for CoWed page
This patch enables to trace old block address of CoWed page for better
debugging.
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 18:36:38 +08:00
|
|
|
fio->new_blkaddr = fio->old_blkaddr;
|
2018-01-31 10:36:57 +08:00
|
|
|
/* i/o temperature is needed for passing down write hints */
|
|
|
|
__get_segment_type(fio);
|
2018-03-26 17:32:23 +08:00
|
|
|
|
2019-04-15 15:30:52 +08:00
|
|
|
segno = GET_SEGNO(sbi, fio->new_blkaddr);
|
|
|
|
|
|
|
|
if (!IS_DATASEG(get_seg_entry(sbi, segno)->type)) {
|
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK);
|
2019-06-18 17:59:03 +08:00
|
|
|
f2fs_warn(sbi, "%s: incorrect segment(%u) type, run fsck to fix.",
|
|
|
|
__func__, segno);
|
2021-04-22 18:19:25 +08:00
|
|
|
err = -EFSCORRUPTED;
|
2022-09-28 23:38:54 +08:00
|
|
|
f2fs_handle_error(sbi, ERROR_INCONSISTENT_SUM_TYPE);
|
2021-04-22 18:19:25 +08:00
|
|
|
goto drop_bio;
|
|
|
|
}
|
|
|
|
|
2021-07-15 07:14:02 +08:00
|
|
|
if (f2fs_cp_error(sbi)) {
|
2021-04-22 18:19:25 +08:00
|
|
|
err = -EIO;
|
|
|
|
goto drop_bio;
|
2019-04-15 15:30:52 +08:00
|
|
|
}
|
2018-03-26 17:32:23 +08:00
|
|
|
|
2022-07-12 23:26:43 +08:00
|
|
|
if (fio->post_read)
|
|
|
|
invalidate_mapping_pages(META_MAPPING(sbi),
|
2021-11-02 15:10:02 +08:00
|
|
|
fio->new_blkaddr, fio->new_blkaddr);
|
|
|
|
|
2015-04-24 05:38:15 +08:00
|
|
|
stat_inc_inplace_blocks(fio->sbi);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
2022-11-19 03:18:39 +08:00
|
|
|
if (fio->bio && !IS_F2FS_IPU_NOCACHE(sbi))
|
f2fs: add bio cache for IPU
SQLite in Wal mode may trigger sequential IPU write in db-wal file, after
commit d1b3e72d5490 ("f2fs: submit bio of in-place-update pages"), we
lost the chance of merging page in inner managed bio cache, result in
submitting more small-sized IO.
So let's add temporary bio in writepages() to cache mergeable write IO as
much as possible.
Test case:
1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
2. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
Before:
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65552, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65560, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65568, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65576, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65584, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65592, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65600, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65608, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65616, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65624, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65632, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65640, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65648, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65656, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65664, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57352, size = 4096
After:
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 65536
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57368, size = 4096
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-19 16:15:29 +08:00
|
|
|
err = f2fs_merge_page_bio(fio);
|
|
|
|
else
|
|
|
|
err = f2fs_submit_page_bio(fio);
|
2019-02-21 20:40:13 +08:00
|
|
|
if (!err) {
|
2021-09-01 14:39:20 +08:00
|
|
|
f2fs_update_device_state(fio->sbi, fio->ino,
|
|
|
|
fio->new_blkaddr, 1);
|
2022-08-20 11:04:41 +08:00
|
|
|
f2fs_update_iostat(fio->sbi, fio->page->mapping->host,
|
|
|
|
fio->io_type, F2FS_BLKSIZE);
|
2019-02-21 20:40:13 +08:00
|
|
|
}
|
2017-08-02 23:21:48 +08:00
|
|
|
|
2021-04-22 18:19:25 +08:00
|
|
|
return err;
|
|
|
|
drop_bio:
|
2021-05-10 12:53:03 +08:00
|
|
|
if (fio->bio && *(fio->bio)) {
|
2021-04-22 18:19:25 +08:00
|
|
|
struct bio *bio = *(fio->bio);
|
|
|
|
|
|
|
|
bio->bi_status = BLK_STS_IOERR;
|
|
|
|
bio_endio(bio);
|
2021-05-10 12:53:03 +08:00
|
|
|
*(fio->bio) = NULL;
|
2021-04-22 18:19:25 +08:00
|
|
|
}
|
2017-08-02 23:21:48 +08:00
|
|
|
return err;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
static inline int __f2fs_get_curseg(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int segno)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = CURSEG_HOT_DATA; i < NO_CHECK_TYPE; i++) {
|
|
|
|
if (CURSEG_I(sbi, i)->segno == segno)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
return i;
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_do_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
|
2015-05-06 13:08:06 +08:00
|
|
|
block_t old_blkaddr, block_t new_blkaddr,
|
2020-08-04 21:14:47 +08:00
|
|
|
bool recover_curseg, bool recover_newaddr,
|
|
|
|
bool from_gc)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
struct curseg_info *curseg;
|
|
|
|
unsigned int segno, old_cursegno;
|
|
|
|
struct seg_entry *se;
|
|
|
|
int type;
|
2015-05-06 13:08:06 +08:00
|
|
|
unsigned short old_blkoff;
|
2021-03-25 22:19:20 +08:00
|
|
|
unsigned char old_alloc_type;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
segno = GET_SEGNO(sbi, new_blkaddr);
|
|
|
|
se = get_seg_entry(sbi, segno);
|
|
|
|
type = se->type;
|
|
|
|
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_down_write(&SM_I(sbi)->curseg_lock);
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
|
2015-05-06 13:08:06 +08:00
|
|
|
if (!recover_curseg) {
|
|
|
|
/* for recovery flow */
|
|
|
|
if (se->valid_blocks == 0 && !IS_CURSEG(sbi, segno)) {
|
|
|
|
if (old_blkaddr == NULL_ADDR)
|
|
|
|
type = CURSEG_COLD_DATA;
|
|
|
|
else
|
|
|
|
type = CURSEG_WARM_DATA;
|
|
|
|
}
|
|
|
|
} else {
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
if (IS_CURSEG(sbi, segno)) {
|
|
|
|
/* se->type is volatile as SSR allocation */
|
|
|
|
type = __f2fs_get_curseg(sbi, segno);
|
|
|
|
f2fs_bug_on(sbi, type == NO_CHECK_TYPE);
|
|
|
|
} else {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
type = CURSEG_WARM_DATA;
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
2015-05-06 13:08:06 +08:00
|
|
|
|
f2fs: check segment type in __f2fs_replace_block
In some case, the node blocks has wrong blkaddr whose segment type is
NODE, e.g., recover inode has missing xattr flag and the blkaddr is in
the xattr range. Since fsck.f2fs does not check the recovery nodes, this
will cause __f2fs_replace_block change the curseg of node and do the
update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a
result, when recovery process write checkpoint and sync nodes, the
next_blkoff of curseg is used in the segment bit map, then it will
cause f2fs_bug_on. So let's check segment type in __f2fs_replace_block.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-04 15:02:02 +08:00
|
|
|
f2fs_bug_on(sbi, !IS_DATASEG(type));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
curseg = CURSEG_I(sbi, type);
|
|
|
|
|
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
old_cursegno = curseg->segno;
|
2015-05-06 13:08:06 +08:00
|
|
|
old_blkoff = curseg->next_blkoff;
|
2021-03-25 22:19:20 +08:00
|
|
|
old_alloc_type = curseg->alloc_type;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
/* change the current segment */
|
|
|
|
if (segno != curseg->segno) {
|
|
|
|
curseg->next_segno = segno;
|
2022-11-28 17:43:46 +08:00
|
|
|
change_curseg(sbi, type);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2014-02-04 12:01:10 +08:00
|
|
|
curseg->next_blkoff = GET_BLKOFF_FROM_SEG0(sbi, new_blkaddr);
|
2023-01-19 14:36:18 +08:00
|
|
|
curseg->sum_blk->entries[curseg->next_blkoff] = *sum;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2020-08-04 21:14:47 +08:00
|
|
|
if (!recover_curseg || recover_newaddr) {
|
|
|
|
if (!from_gc)
|
|
|
|
update_segment_mtime(sbi, new_blkaddr, 0);
|
2015-10-08 03:28:41 +08:00
|
|
|
update_sit_entry(sbi, new_blkaddr, 1);
|
2020-08-04 21:14:47 +08:00
|
|
|
}
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO) {
|
|
|
|
invalidate_mapping_pages(META_MAPPING(sbi),
|
|
|
|
old_blkaddr, old_blkaddr);
|
2021-05-20 19:51:50 +08:00
|
|
|
f2fs_invalidate_compress_page(sbi, old_blkaddr);
|
2020-08-04 21:14:47 +08:00
|
|
|
if (!from_gc)
|
|
|
|
update_segment_mtime(sbi, old_blkaddr, 0);
|
2015-10-08 03:28:41 +08:00
|
|
|
update_sit_entry(sbi, old_blkaddr, -1);
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
}
|
2015-10-08 03:28:41 +08:00
|
|
|
|
|
|
|
locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
|
|
|
|
locate_dirty_segment(sbi, GET_SEGNO(sbi, new_blkaddr));
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
locate_dirty_segment(sbi, old_cursegno);
|
|
|
|
|
2015-05-06 13:08:06 +08:00
|
|
|
if (recover_curseg) {
|
|
|
|
if (old_cursegno != curseg->segno) {
|
|
|
|
curseg->next_segno = old_cursegno;
|
2022-11-28 17:43:46 +08:00
|
|
|
change_curseg(sbi, type);
|
2015-05-06 13:08:06 +08:00
|
|
|
}
|
|
|
|
curseg->next_blkoff = old_blkoff;
|
2021-03-25 22:19:20 +08:00
|
|
|
curseg->alloc_type = old_alloc_type;
|
2015-05-06 13:08:06 +08:00
|
|
|
}
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
2022-01-08 04:48:44 +08:00
|
|
|
f2fs_up_write(&SM_I(sbi)->curseg_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2015-05-28 19:15:35 +08:00
|
|
|
void f2fs_replace_block(struct f2fs_sb_info *sbi, struct dnode_of_data *dn,
|
|
|
|
block_t old_addr, block_t new_addr,
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
unsigned char version, bool recover_curseg,
|
|
|
|
bool recover_newaddr)
|
2015-05-28 19:15:35 +08:00
|
|
|
{
|
|
|
|
struct f2fs_summary sum;
|
|
|
|
|
|
|
|
set_summary(&sum, dn->nid, dn->ofs_in_node, version);
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_do_replace_block(sbi, &sum, old_addr, new_addr,
|
2020-08-04 21:14:47 +08:00
|
|
|
recover_curseg, recover_newaddr, false);
|
2015-05-28 19:15:35 +08:00
|
|
|
|
2016-02-24 17:16:47 +08:00
|
|
|
f2fs_update_data_blkaddr(dn, new_addr);
|
2015-05-28 19:15:35 +08:00
|
|
|
}
|
|
|
|
|
2013-11-30 11:51:14 +08:00
|
|
|
void f2fs_wait_on_page_writeback(struct page *page,
|
2018-12-25 17:43:42 +08:00
|
|
|
enum page_type type, bool ordered, bool locked)
|
2013-11-30 11:51:14 +08:00
|
|
|
{
|
|
|
|
if (PageWriteback(page)) {
|
2014-09-03 06:31:18 +08:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_P_SB(page);
|
|
|
|
|
2019-09-30 18:53:25 +08:00
|
|
|
/* submit cached LFS IO */
|
2018-09-27 23:41:16 +08:00
|
|
|
f2fs_submit_merged_write_cond(sbi, NULL, page, 0, type);
|
2023-02-06 19:56:00 +08:00
|
|
|
/* submit cached IPU IO */
|
2019-09-30 18:53:25 +08:00
|
|
|
f2fs_submit_merged_ipu_write(sbi, NULL, page);
|
2018-12-25 17:43:42 +08:00
|
|
|
if (ordered) {
|
2016-01-20 23:43:51 +08:00
|
|
|
wait_on_page_writeback(page);
|
2018-12-25 17:43:42 +08:00
|
|
|
f2fs_bug_on(sbi, locked && PageWriteback(page));
|
|
|
|
} else {
|
2016-01-20 23:43:51 +08:00
|
|
|
wait_for_stable_page(page);
|
2018-12-25 17:43:42 +08:00
|
|
|
}
|
2013-11-30 11:51:14 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-08-23 12:18:00 +08:00
|
|
|
void f2fs_wait_on_block_writeback(struct inode *inode, block_t blkaddr)
|
2015-10-08 13:27:34 +08:00
|
|
|
{
|
2018-08-23 12:18:00 +08:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
2015-10-08 13:27:34 +08:00
|
|
|
struct page *cpage;
|
|
|
|
|
2018-08-23 12:18:00 +08:00
|
|
|
if (!f2fs_post_read_required(inode))
|
|
|
|
return;
|
|
|
|
|
f2fs: introduce DATA_GENERIC_ENHANCE
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
whether @blkaddr locates in main area or not.
That check is weak, since the block address in range of main area can
point to the address which is not valid in segment info table, and we
can not detect such condition, we may suffer worse corruption as system
continues running.
So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
which trigger SIT bitmap check rather than only range check.
This patch did below changes as wel:
- set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
- get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
- introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
- spread blkaddr check in:
* f2fs_get_node_info()
* __read_out_blkaddrs()
* f2fs_submit_page_read()
* ra_data_block()
* do_recover_data()
This patch can fix bug reported from bugzilla below:
https://bugzilla.kernel.org/show_bug.cgi?id=203215
https://bugzilla.kernel.org/show_bug.cgi?id=203223
https://bugzilla.kernel.org/show_bug.cgi?id=203231
https://bugzilla.kernel.org/show_bug.cgi?id=203235
https://bugzilla.kernel.org/show_bug.cgi?id=203241
= Update by Jaegeuk Kim =
DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
But, xfstest/generic/446 compalins some generated kernel messages saying invalid
bitmap was detected when reading a block. The reaons is, when we get the
block addresses from extent_cache, there is no lock to synchronize it from
truncating the blocks in parallel.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-15 15:26:32 +08:00
|
|
|
if (!__is_valid_data_blkaddr(blkaddr))
|
2015-10-08 13:27:34 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
cpage = find_lock_page(META_MAPPING(sbi), blkaddr);
|
|
|
|
if (cpage) {
|
2018-12-25 17:43:42 +08:00
|
|
|
f2fs_wait_on_page_writeback(cpage, DATA, true, true);
|
2015-10-08 13:27:34 +08:00
|
|
|
f2fs_put_page(cpage, 1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-10-10 13:26:22 +08:00
|
|
|
void f2fs_wait_on_block_writeback_range(struct inode *inode, block_t blkaddr,
|
|
|
|
block_t len)
|
|
|
|
{
|
2022-07-12 23:26:43 +08:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
2018-10-10 13:26:22 +08:00
|
|
|
block_t i;
|
|
|
|
|
2022-07-12 23:26:43 +08:00
|
|
|
if (!f2fs_post_read_required(inode))
|
|
|
|
return;
|
|
|
|
|
2018-10-10 13:26:22 +08:00
|
|
|
for (i = 0; i < len; i++)
|
|
|
|
f2fs_wait_on_block_writeback(inode, blkaddr + i);
|
2022-07-12 23:26:43 +08:00
|
|
|
|
|
|
|
invalidate_mapping_pages(META_MAPPING(sbi), blkaddr, blkaddr + len - 1);
|
2018-10-10 13:26:22 +08:00
|
|
|
}
|
|
|
|
|
2018-07-17 00:02:17 +08:00
|
|
|
static int read_compacted_summaries(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
|
|
|
|
struct curseg_info *seg_i;
|
|
|
|
unsigned char *kaddr;
|
|
|
|
struct page *page;
|
|
|
|
block_t start;
|
|
|
|
int i, j, offset;
|
|
|
|
|
|
|
|
start = start_sum_block(sbi);
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
page = f2fs_get_meta_page(sbi, start++);
|
2018-07-17 00:02:17 +08:00
|
|
|
if (IS_ERR(page))
|
|
|
|
return PTR_ERR(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kaddr = (unsigned char *)page_address(page);
|
|
|
|
|
|
|
|
/* Step 1: restore nat cache */
|
|
|
|
seg_i = CURSEG_I(sbi, CURSEG_HOT_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
memcpy(seg_i->journal, kaddr, SUM_JOURNAL_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
/* Step 2: restore sit cache */
|
|
|
|
seg_i = CURSEG_I(sbi, CURSEG_COLD_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
memcpy(seg_i->journal, kaddr + SUM_JOURNAL_SIZE, SUM_JOURNAL_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
offset = 2 * SUM_JOURNAL_SIZE;
|
|
|
|
|
|
|
|
/* Step 3: restore summary entries */
|
|
|
|
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
|
|
|
|
unsigned short blk_off;
|
|
|
|
unsigned int segno;
|
|
|
|
|
|
|
|
seg_i = CURSEG_I(sbi, i);
|
|
|
|
segno = le32_to_cpu(ckpt->cur_data_segno[i]);
|
|
|
|
blk_off = le16_to_cpu(ckpt->cur_data_blkoff[i]);
|
|
|
|
seg_i->next_segno = segno;
|
|
|
|
reset_curseg(sbi, i, 0);
|
|
|
|
seg_i->alloc_type = ckpt->alloc_type[i];
|
|
|
|
seg_i->next_blkoff = blk_off;
|
|
|
|
|
|
|
|
if (seg_i->alloc_type == SSR)
|
|
|
|
blk_off = sbi->blocks_per_seg;
|
|
|
|
|
|
|
|
for (j = 0; j < blk_off; j++) {
|
|
|
|
struct f2fs_summary *s;
|
2021-04-06 09:47:35 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
s = (struct f2fs_summary *)(kaddr + offset);
|
|
|
|
seg_i->sum_blk->entries[j] = *s;
|
|
|
|
offset += SUMMARY_SIZE;
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
if (offset + SUMMARY_SIZE <= PAGE_SIZE -
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
SUM_FOOTER_SIZE)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
page = NULL;
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
page = f2fs_get_meta_page(sbi, start++);
|
2018-07-17 00:02:17 +08:00
|
|
|
if (IS_ERR(page))
|
|
|
|
return PTR_ERR(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kaddr = (unsigned char *)page_address(page);
|
|
|
|
offset = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
f2fs_put_page(page, 1);
|
2018-07-17 00:02:17 +08:00
|
|
|
return 0;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int read_normal_summaries(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
|
|
|
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
|
|
|
|
struct f2fs_summary_block *sum;
|
|
|
|
struct curseg_info *curseg;
|
|
|
|
struct page *new;
|
|
|
|
unsigned short blk_off;
|
|
|
|
unsigned int segno = 0;
|
|
|
|
block_t blk_addr = 0;
|
2018-07-17 00:02:17 +08:00
|
|
|
int err = 0;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
/* get segment number and block addr */
|
|
|
|
if (IS_DATASEG(type)) {
|
|
|
|
segno = le32_to_cpu(ckpt->cur_data_segno[type]);
|
|
|
|
blk_off = le16_to_cpu(ckpt->cur_data_blkoff[type -
|
|
|
|
CURSEG_HOT_DATA]);
|
2015-01-30 03:45:33 +08:00
|
|
|
if (__exist_node_summaries(sbi))
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
blk_addr = sum_blk_addr(sbi, NR_CURSEG_PERSIST_TYPE, type);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
else
|
|
|
|
blk_addr = sum_blk_addr(sbi, NR_CURSEG_DATA_TYPE, type);
|
|
|
|
} else {
|
|
|
|
segno = le32_to_cpu(ckpt->cur_node_segno[type -
|
|
|
|
CURSEG_HOT_NODE]);
|
|
|
|
blk_off = le16_to_cpu(ckpt->cur_node_blkoff[type -
|
|
|
|
CURSEG_HOT_NODE]);
|
2015-01-30 03:45:33 +08:00
|
|
|
if (__exist_node_summaries(sbi))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
blk_addr = sum_blk_addr(sbi, NR_CURSEG_NODE_TYPE,
|
|
|
|
type - CURSEG_HOT_NODE);
|
|
|
|
else
|
|
|
|
blk_addr = GET_SUM_BLOCK(sbi, segno);
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
new = f2fs_get_meta_page(sbi, blk_addr);
|
2018-07-17 00:02:17 +08:00
|
|
|
if (IS_ERR(new))
|
|
|
|
return PTR_ERR(new);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
sum = (struct f2fs_summary_block *)page_address(new);
|
|
|
|
|
|
|
|
if (IS_NODESEG(type)) {
|
2015-01-30 03:45:33 +08:00
|
|
|
if (__exist_node_summaries(sbi)) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
struct f2fs_summary *ns = &sum->entries[0];
|
|
|
|
int i;
|
2021-04-06 09:47:35 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
for (i = 0; i < sbi->blocks_per_seg; i++, ns++) {
|
|
|
|
ns->version = 0;
|
|
|
|
ns->ofs_in_node = 0;
|
|
|
|
}
|
|
|
|
} else {
|
2018-07-17 00:02:17 +08:00
|
|
|
err = f2fs_restore_node_summary(sbi, segno, sum);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* set uncompleted segment to curseg */
|
|
|
|
curseg = CURSEG_I(sbi, type);
|
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
|
|
|
|
/* update journal info */
|
|
|
|
down_write(&curseg->journal_rwsem);
|
|
|
|
memcpy(curseg->journal, &sum->journal, SUM_JOURNAL_SIZE);
|
|
|
|
up_write(&curseg->journal_rwsem);
|
|
|
|
|
|
|
|
memcpy(curseg->sum_blk->entries, sum->entries, SUM_ENTRY_SIZE);
|
|
|
|
memcpy(&curseg->sum_blk->footer, &sum->footer, SUM_FOOTER_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
curseg->next_segno = segno;
|
|
|
|
reset_curseg(sbi, type, 0);
|
|
|
|
curseg->alloc_type = ckpt->alloc_type[type];
|
|
|
|
curseg->next_blkoff = blk_off;
|
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
2018-07-17 00:02:17 +08:00
|
|
|
out:
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
f2fs_put_page(new, 1);
|
2018-07-17 00:02:17 +08:00
|
|
|
return err;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int restore_curseg_summaries(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2017-06-02 02:18:30 +08:00
|
|
|
struct f2fs_journal *sit_j = CURSEG_I(sbi, CURSEG_COLD_DATA)->journal;
|
|
|
|
struct f2fs_journal *nat_j = CURSEG_I(sbi, CURSEG_HOT_DATA)->journal;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
int type = CURSEG_HOT_DATA;
|
2014-03-17 16:36:24 +08:00
|
|
|
int err;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2016-09-20 11:04:18 +08:00
|
|
|
if (is_set_ckpt_flags(sbi, CP_COMPACT_SUM_FLAG)) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int npages = f2fs_npages_for_summary_flush(sbi, true);
|
2014-12-09 14:21:46 +08:00
|
|
|
|
|
|
|
if (npages >= 2)
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_ra_meta_pages(sbi, start_sum_block(sbi), npages,
|
2015-10-12 17:05:59 +08:00
|
|
|
META_CP, true);
|
2014-12-09 14:21:46 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/* restore for compacted data summary */
|
2018-07-17 00:02:17 +08:00
|
|
|
err = read_compacted_summaries(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
type = CURSEG_HOT_NODE;
|
|
|
|
}
|
|
|
|
|
2015-01-30 03:45:33 +08:00
|
|
|
if (__exist_node_summaries(sbi))
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
f2fs_ra_meta_pages(sbi,
|
|
|
|
sum_blk_addr(sbi, NR_CURSEG_PERSIST_TYPE, type),
|
|
|
|
NR_CURSEG_PERSIST_TYPE - type, META_CP, true);
|
2014-12-09 14:21:46 +08:00
|
|
|
|
2014-03-17 16:36:24 +08:00
|
|
|
for (; type <= CURSEG_COLD_NODE; type++) {
|
|
|
|
err = read_normal_summaries(sbi, type);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-06-02 02:18:30 +08:00
|
|
|
/* sanity check for summary blocks */
|
|
|
|
if (nats_in_cursum(nat_j) > NAT_JOURNAL_ENTRIES ||
|
2019-05-23 12:19:17 +08:00
|
|
|
sits_in_cursum(sit_j) > SIT_JOURNAL_ENTRIES) {
|
2021-05-27 04:05:36 +08:00
|
|
|
f2fs_err(sbi, "invalid journal entries nats %u sits %u",
|
2019-06-18 17:48:42 +08:00
|
|
|
nats_in_cursum(nat_j), sits_in_cursum(sit_j));
|
2017-06-02 02:18:30 +08:00
|
|
|
return -EINVAL;
|
2019-05-23 12:19:17 +08:00
|
|
|
}
|
2017-06-02 02:18:30 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void write_compacted_summaries(struct f2fs_sb_info *sbi, block_t blkaddr)
|
|
|
|
{
|
|
|
|
struct page *page;
|
|
|
|
unsigned char *kaddr;
|
|
|
|
struct f2fs_summary *summary;
|
|
|
|
struct curseg_info *seg_i;
|
|
|
|
int written_size = 0;
|
|
|
|
int i, j;
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
page = f2fs_grab_meta_page(sbi, blkaddr++);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kaddr = (unsigned char *)page_address(page);
|
2018-04-09 20:25:06 +08:00
|
|
|
memset(kaddr, 0, PAGE_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
/* Step 1: write nat cache */
|
|
|
|
seg_i = CURSEG_I(sbi, CURSEG_HOT_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
memcpy(kaddr, seg_i->journal, SUM_JOURNAL_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
written_size += SUM_JOURNAL_SIZE;
|
|
|
|
|
|
|
|
/* Step 2: write sit cache */
|
|
|
|
seg_i = CURSEG_I(sbi, CURSEG_COLD_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
memcpy(kaddr + written_size, seg_i->journal, SUM_JOURNAL_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
written_size += SUM_JOURNAL_SIZE;
|
|
|
|
|
|
|
|
/* Step 3: write summary entries */
|
|
|
|
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
|
|
|
|
seg_i = CURSEG_I(sbi, i);
|
2023-01-19 14:36:20 +08:00
|
|
|
for (j = 0; j < f2fs_curseg_valid_blocks(sbi, i); j++) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!page) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
page = f2fs_grab_meta_page(sbi, blkaddr++);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kaddr = (unsigned char *)page_address(page);
|
2018-04-09 20:25:06 +08:00
|
|
|
memset(kaddr, 0, PAGE_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
written_size = 0;
|
|
|
|
}
|
|
|
|
summary = (struct f2fs_summary *)(kaddr + written_size);
|
|
|
|
*summary = seg_i->sum_blk->entries[j];
|
|
|
|
written_size += SUMMARY_SIZE;
|
|
|
|
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
if (written_size + SUMMARY_SIZE <= PAGE_SIZE -
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
SUM_FOOTER_SIZE)
|
|
|
|
continue;
|
|
|
|
|
2013-10-24 15:08:28 +08:00
|
|
|
set_page_dirty(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
page = NULL;
|
|
|
|
}
|
|
|
|
}
|
2013-10-24 15:08:28 +08:00
|
|
|
if (page) {
|
|
|
|
set_page_dirty(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
f2fs_put_page(page, 1);
|
2013-10-24 15:08:28 +08:00
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void write_normal_summaries(struct f2fs_sb_info *sbi,
|
|
|
|
block_t blkaddr, int type)
|
|
|
|
{
|
|
|
|
int i, end;
|
2021-04-06 09:47:35 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (IS_DATASEG(type))
|
|
|
|
end = type + NR_CURSEG_DATA_TYPE;
|
|
|
|
else
|
|
|
|
end = type + NR_CURSEG_NODE_TYPE;
|
|
|
|
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
for (i = type; i < end; i++)
|
|
|
|
write_current_sum_page(sbi, i, blkaddr + (i - type));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2016-09-20 11:04:18 +08:00
|
|
|
if (is_set_ckpt_flags(sbi, CP_COMPACT_SUM_FLAG))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
write_compacted_summaries(sbi, start_blk);
|
|
|
|
else
|
|
|
|
write_normal_summaries(sbi, start_blk, CURSEG_HOT_DATA);
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2015-01-30 03:45:33 +08:00
|
|
|
write_normal_summaries(sbi, start_blk, CURSEG_HOT_NODE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
unsigned int val, int alloc)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (type == NAT_JOURNAL) {
|
2016-02-14 18:50:40 +08:00
|
|
|
for (i = 0; i < nats_in_cursum(journal); i++) {
|
|
|
|
if (le32_to_cpu(nid_in_journal(journal, i)) == val)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return i;
|
|
|
|
}
|
2016-02-14 18:50:40 +08:00
|
|
|
if (alloc && __has_cursum_space(journal, 1, NAT_JOURNAL))
|
|
|
|
return update_nats_in_cursum(journal, 1);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
} else if (type == SIT_JOURNAL) {
|
2016-02-14 18:50:40 +08:00
|
|
|
for (i = 0; i < sits_in_cursum(journal); i++)
|
|
|
|
if (le32_to_cpu(segno_in_journal(journal, i)) == val)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return i;
|
2016-02-14 18:50:40 +08:00
|
|
|
if (alloc && __has_cursum_space(journal, 1, SIT_JOURNAL))
|
|
|
|
return update_sits_in_cursum(journal, 1);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct page *get_current_sit_page(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int segno)
|
|
|
|
{
|
2020-10-03 05:17:35 +08:00
|
|
|
return f2fs_get_meta_page(sbi, current_sit_addr(sbi, segno));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct page *get_next_sit_page(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int start)
|
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
f2fs: rebuild sit page from sit info in mem
This patch rebuild sit page from sit info in mem instead
of issue a read io.
I test this method and the result is as below:
Pre:
mmc_perf_test-12061 [001] ...1 976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [001] ...1 1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Some sit flush consume more than 50ms.
Now:
mmc_perf_test-2187 [007] ...1 196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [007] ...1 219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [002] ...1 265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [005] ...1 343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [005] ...1 343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [001] ...1 397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [001] ...1 397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Most sit flush consume no more than 1ms.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 17:27:11 +08:00
|
|
|
struct page *page;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
pgoff_t src_off, dst_off;
|
|
|
|
|
|
|
|
src_off = current_sit_addr(sbi, start);
|
|
|
|
dst_off = next_sit_addr(sbi, src_off);
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
page = f2fs_grab_meta_page(sbi, dst_off);
|
f2fs: rebuild sit page from sit info in mem
This patch rebuild sit page from sit info in mem instead
of issue a read io.
I test this method and the result is as below:
Pre:
mmc_perf_test-12061 [001] ...1 976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [001] ...1 1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Some sit flush consume more than 50ms.
Now:
mmc_perf_test-2187 [007] ...1 196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [007] ...1 219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [002] ...1 265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [005] ...1 343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [005] ...1 343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [001] ...1 397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [001] ...1 397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Most sit flush consume no more than 1ms.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 17:27:11 +08:00
|
|
|
seg_info_to_sit_page(sbi, page, start);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: rebuild sit page from sit info in mem
This patch rebuild sit page from sit info in mem instead
of issue a read io.
I test this method and the result is as below:
Pre:
mmc_perf_test-12061 [001] ...1 976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [001] ...1 1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Some sit flush consume more than 50ms.
Now:
mmc_perf_test-2187 [007] ...1 196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [007] ...1 219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [002] ...1 265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [005] ...1 343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [005] ...1 343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [001] ...1 397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [001] ...1 397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Most sit flush consume no more than 1ms.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 17:27:11 +08:00
|
|
|
set_page_dirty(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
set_to_next_sit(sit_i, start);
|
|
|
|
|
f2fs: rebuild sit page from sit info in mem
This patch rebuild sit page from sit info in mem instead
of issue a read io.
I test this method and the result is as below:
Pre:
mmc_perf_test-12061 [001] ...1 976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [001] ...1 1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Some sit flush consume more than 50ms.
Now:
mmc_perf_test-2187 [007] ...1 196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [007] ...1 219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [002] ...1 265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [005] ...1 343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [005] ...1 343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [001] ...1 397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [001] ...1 397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Most sit flush consume no more than 1ms.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 17:27:11 +08:00
|
|
|
return page;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
static struct sit_entry_set *grab_sit_entry_set(void)
|
|
|
|
{
|
|
|
|
struct sit_entry_set *ses =
|
2021-08-09 08:24:48 +08:00
|
|
|
f2fs_kmem_cache_alloc(sit_entry_set_slab,
|
|
|
|
GFP_NOFS, true, NULL);
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
|
|
|
ses->entry_cnt = 0;
|
|
|
|
INIT_LIST_HEAD(&ses->set_list);
|
|
|
|
return ses;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void release_sit_entry_set(struct sit_entry_set *ses)
|
|
|
|
{
|
|
|
|
list_del(&ses->set_list);
|
|
|
|
kmem_cache_free(sit_entry_set_slab, ses);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void adjust_sit_entry_set(struct sit_entry_set *ses,
|
|
|
|
struct list_head *head)
|
|
|
|
{
|
|
|
|
struct sit_entry_set *next = ses;
|
|
|
|
|
|
|
|
if (list_is_last(&ses->set_list, head))
|
|
|
|
return;
|
|
|
|
|
|
|
|
list_for_each_entry_continue(next, head, set_list)
|
2022-04-12 20:20:39 +08:00
|
|
|
if (ses->entry_cnt <= next->entry_cnt) {
|
|
|
|
list_move_tail(&ses->set_list, &next->set_list);
|
|
|
|
return;
|
|
|
|
}
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
2022-04-12 20:20:39 +08:00
|
|
|
list_move_tail(&ses->set_list, head);
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void add_sit_entry(unsigned int segno, struct list_head *head)
|
|
|
|
{
|
|
|
|
struct sit_entry_set *ses;
|
|
|
|
unsigned int start_segno = START_SEGNO(segno);
|
|
|
|
|
|
|
|
list_for_each_entry(ses, head, set_list) {
|
|
|
|
if (ses->start_segno == start_segno) {
|
|
|
|
ses->entry_cnt++;
|
|
|
|
adjust_sit_entry_set(ses, head);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
ses = grab_sit_entry_set();
|
|
|
|
|
|
|
|
ses->start_segno = start_segno;
|
|
|
|
ses->entry_cnt++;
|
|
|
|
list_add(&ses->set_list, head);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void add_sits_in_set(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct f2fs_sm_info *sm_info = SM_I(sbi);
|
|
|
|
struct list_head *set_list = &sm_info->sit_entry_set;
|
|
|
|
unsigned long *bitmap = SIT_I(sbi)->dirty_sentries_bitmap;
|
|
|
|
unsigned int segno;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
for_each_set_bit(segno, bitmap, MAIN_SEGS(sbi))
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
add_sit_entry(segno, set_list);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void remove_sits_in_journal(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
struct f2fs_journal *journal = curseg->journal;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
int i;
|
|
|
|
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
down_write(&curseg->journal_rwsem);
|
2016-02-14 18:50:40 +08:00
|
|
|
for (i = 0; i < sits_in_cursum(journal); i++) {
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
unsigned int segno;
|
|
|
|
bool dirtied;
|
|
|
|
|
2016-02-14 18:50:40 +08:00
|
|
|
segno = le32_to_cpu(segno_in_journal(journal, i));
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
dirtied = __mark_sit_entry_dirty(sbi, segno);
|
|
|
|
|
|
|
|
if (!dirtied)
|
|
|
|
add_sit_entry(segno, &SM_I(sbi)->sit_entry_set);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
2016-02-14 18:50:40 +08:00
|
|
|
update_sits_in_cursum(journal, -i);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
up_write(&curseg->journal_rwsem);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* CP calls this function, which flushes SIT entries including sit_journal,
|
|
|
|
* and moves prefree segs to free segs.
|
|
|
|
*/
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
unsigned long *bitmap = sit_i->dirty_sentries_bitmap;
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
struct f2fs_journal *journal = curseg->journal;
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
struct sit_entry_set *ses, *tmp;
|
|
|
|
struct list_head *head = &SM_I(sbi)->sit_entry_set;
|
2019-06-05 11:33:25 +08:00
|
|
|
bool to_journal = !is_sbi_flag_set(sbi, SBI_IS_RESIZEFS);
|
2014-09-21 13:06:39 +08:00
|
|
|
struct seg_entry *se;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2015-02-27 16:52:50 +08:00
|
|
|
if (!sit_i->dirty_sentries)
|
|
|
|
goto out;
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/*
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
* add and account sit entries of dirty bitmap in sit entry
|
|
|
|
* set temporarily
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
*/
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
add_sits_in_set(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
/*
|
|
|
|
* if there are no enough space in journal to store dirty sit
|
|
|
|
* entries, remove all entries from journal and add and account
|
|
|
|
* them in sit entry set.
|
|
|
|
*/
|
2019-06-05 11:33:25 +08:00
|
|
|
if (!__has_cursum_space(journal, sit_i->dirty_sentries, SIT_JOURNAL) ||
|
|
|
|
!to_journal)
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
remove_sits_in_journal(sbi);
|
2013-11-12 13:49:56 +08:00
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
/*
|
|
|
|
* there are two steps to flush sit entries:
|
|
|
|
* #1, flush sit entries to journal in current cold data summary block.
|
|
|
|
* #2, flush sit entries to sit page.
|
|
|
|
*/
|
|
|
|
list_for_each_entry_safe(ses, tmp, head, set_list) {
|
2014-10-17 02:43:30 +08:00
|
|
|
struct page *page = NULL;
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
struct f2fs_sit_block *raw_sit = NULL;
|
|
|
|
unsigned int start_segno = ses->start_segno;
|
|
|
|
unsigned int end = min(start_segno + SIT_ENTRY_PER_BLOCK,
|
2014-09-24 02:23:01 +08:00
|
|
|
(unsigned long)MAIN_SEGS(sbi));
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
unsigned int segno = start_segno;
|
|
|
|
|
|
|
|
if (to_journal &&
|
2016-02-14 18:50:40 +08:00
|
|
|
!__has_cursum_space(journal, ses->entry_cnt, SIT_JOURNAL))
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
to_journal = false;
|
|
|
|
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
if (to_journal) {
|
|
|
|
down_write(&curseg->journal_rwsem);
|
|
|
|
} else {
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
page = get_next_sit_page(sbi, start_segno);
|
|
|
|
raw_sit = page_address(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
/* flush dirty sit entries in region of current sit set */
|
|
|
|
for_each_set_bit_from(segno, bitmap, end) {
|
|
|
|
int offset, sit_offset;
|
2014-09-21 13:06:39 +08:00
|
|
|
|
|
|
|
se = get_seg_entry(sbi, segno);
|
2018-04-09 04:28:41 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
if (memcmp(se->cur_valid_map, se->cur_valid_map_mir,
|
|
|
|
SIT_VBLOCK_MAP_SIZE))
|
|
|
|
f2fs_bug_on(sbi, 1);
|
|
|
|
#endif
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
|
|
|
/* add discard candidates */
|
2017-04-27 20:40:39 +08:00
|
|
|
if (!(cpc->reason & CP_DISCARD)) {
|
2014-09-21 13:06:39 +08:00
|
|
|
cpc->trim_start = segno;
|
2016-12-30 14:06:15 +08:00
|
|
|
add_discard_addrs(sbi, cpc, false);
|
2014-09-21 13:06:39 +08:00
|
|
|
}
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
|
|
|
if (to_journal) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
offset = f2fs_lookup_journal_in_cursum(journal,
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
SIT_JOURNAL, segno, 1);
|
|
|
|
f2fs_bug_on(sbi, offset < 0);
|
2016-02-14 18:50:40 +08:00
|
|
|
segno_in_journal(journal, offset) =
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
cpu_to_le32(segno);
|
|
|
|
seg_info_to_raw_sit(se,
|
2016-02-14 18:50:40 +08:00
|
|
|
&sit_in_journal(journal, offset));
|
2018-04-09 04:28:41 +08:00
|
|
|
check_block_count(sbi, segno,
|
|
|
|
&sit_in_journal(journal, offset));
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
} else {
|
|
|
|
sit_offset = SIT_ENTRY_OFFSET(sit_i, segno);
|
|
|
|
seg_info_to_raw_sit(se,
|
|
|
|
&raw_sit->entries[sit_offset]);
|
2018-04-09 04:28:41 +08:00
|
|
|
check_block_count(sbi, segno,
|
|
|
|
&raw_sit->entries[sit_offset]);
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
__clear_bit(segno, bitmap);
|
|
|
|
sit_i->dirty_sentries--;
|
|
|
|
ses->entry_cnt--;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
if (to_journal)
|
|
|
|
up_write(&curseg->journal_rwsem);
|
|
|
|
else
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
|
|
|
|
f2fs_bug_on(sbi, ses->entry_cnt);
|
|
|
|
release_sit_entry_set(ses);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
|
|
|
f2fs_bug_on(sbi, !list_empty(head));
|
|
|
|
f2fs_bug_on(sbi, sit_i->dirty_sentries);
|
|
|
|
out:
|
2017-04-27 20:40:39 +08:00
|
|
|
if (cpc->reason & CP_DISCARD) {
|
2016-12-22 11:46:24 +08:00
|
|
|
__u64 trim_start = cpc->trim_start;
|
|
|
|
|
2014-09-21 13:06:39 +08:00
|
|
|
for (; cpc->trim_start <= cpc->trim_end; cpc->trim_start++)
|
2016-12-30 14:06:15 +08:00
|
|
|
add_discard_addrs(sbi, cpc, false);
|
2016-12-22 11:46:24 +08:00
|
|
|
|
|
|
|
cpc->trim_start = trim_start;
|
2014-09-21 13:06:39 +08:00
|
|
|
}
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
set_prefree_as_free_segments(sbi);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int build_sit_info(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
|
|
|
|
struct sit_info *sit_i;
|
|
|
|
unsigned int sit_segs, start;
|
2019-07-26 15:41:20 +08:00
|
|
|
char *src_bitmap, *bitmap;
|
2019-08-07 21:40:32 +08:00
|
|
|
unsigned int bitmap_size, main_bitmap_size, sit_bitmap_size;
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
unsigned int discard_map = f2fs_block_unit_discard(sbi) ? 1 : 0;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
/* allocate memory for SIT information */
|
2017-11-30 19:28:17 +08:00
|
|
|
sit_i = f2fs_kzalloc(sbi, sizeof(struct sit_info), GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sit_i)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
SM_I(sbi)->sit_info = sit_i;
|
|
|
|
|
treewide: Use array_size in f2fs_kvzalloc()
The f2fs_kvzalloc() function has no 2-factor argument form, so
multiplication factors need to be wrapped in array_size(). This patch
replaces cases of:
f2fs_kvzalloc(handle, a * b, gfp)
with:
f2fs_kvzalloc(handle, array_size(a, b), gfp)
as well as handling cases of:
f2fs_kvzalloc(handle, a * b * c, gfp)
with:
f2fs_kvzalloc(handle, array3_size(a, b, c), gfp)
This does, however, attempt to ignore constant size factors like:
f2fs_kvzalloc(handle, 4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
expression HANDLE;
type TYPE;
expression THING, E;
@@
(
f2fs_kvzalloc(HANDLE,
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
f2fs_kvzalloc(HANDLE,
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression HANDLE;
expression COUNT;
typedef u8;
typedef __u8;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(char) * COUNT
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
expression HANDLE;
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT_ID
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT_ID
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
)
// 2-factor product, only identifiers.
@@
expression HANDLE;
identifier SIZE, COUNT;
@@
f2fs_kvzalloc(HANDLE,
- SIZE * COUNT
+ array_size(COUNT, SIZE)
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression HANDLE;
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression HANDLE;
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
expression HANDLE;
identifier STRIDE, SIZE, COUNT;
@@
(
f2fs_kvzalloc(HANDLE,
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression HANDLE;
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
f2fs_kvzalloc(HANDLE, C1 * C2 * C3, ...)
|
f2fs_kvzalloc(HANDLE,
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants.
@@
expression HANDLE;
expression E1, E2;
constant C1, C2;
@@
(
f2fs_kvzalloc(HANDLE, C1 * C2, ...)
|
f2fs_kvzalloc(HANDLE,
- E1 * E2
+ array_size(E1, E2)
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13 05:28:35 +08:00
|
|
|
sit_i->sentries =
|
|
|
|
f2fs_kvzalloc(sbi, array_size(sizeof(struct seg_entry),
|
|
|
|
MAIN_SEGS(sbi)),
|
|
|
|
GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sit_i->sentries)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2019-08-07 21:40:32 +08:00
|
|
|
main_bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
|
|
|
|
sit_i->dirty_sentries_bitmap = f2fs_kvzalloc(sbi, main_bitmap_size,
|
2017-11-30 19:28:18 +08:00
|
|
|
GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sit_i->dirty_sentries_bitmap)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2019-07-26 15:41:20 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * (3 + discard_map);
|
2019-07-26 15:41:20 +08:00
|
|
|
#else
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * (2 + discard_map);
|
2019-07-26 15:41:20 +08:00
|
|
|
#endif
|
|
|
|
sit_i->bitmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL);
|
|
|
|
if (!sit_i->bitmap)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
bitmap = sit_i->bitmap;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
for (start = 0; start < MAIN_SEGS(sbi); start++) {
|
2019-07-26 15:41:20 +08:00
|
|
|
sit_i->sentries[start].cur_valid_map = bitmap;
|
|
|
|
bitmap += SIT_VBLOCK_MAP_SIZE;
|
|
|
|
|
|
|
|
sit_i->sentries[start].ckpt_valid_map = bitmap;
|
|
|
|
bitmap += SIT_VBLOCK_MAP_SIZE;
|
2016-08-03 01:56:40 +08:00
|
|
|
|
2017-01-07 18:51:01 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
2019-07-26 15:41:20 +08:00
|
|
|
sit_i->sentries[start].cur_valid_map_mir = bitmap;
|
|
|
|
bitmap += SIT_VBLOCK_MAP_SIZE;
|
2017-01-07 18:51:01 +08:00
|
|
|
#endif
|
|
|
|
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
if (discard_map) {
|
|
|
|
sit_i->sentries[start].discard_map = bitmap;
|
|
|
|
bitmap += SIT_VBLOCK_MAP_SIZE;
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2017-11-30 19:28:17 +08:00
|
|
|
sit_i->tmp_map = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
|
2015-02-11 08:44:29 +08:00
|
|
|
if (!sit_i->tmp_map)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2018-10-24 18:37:26 +08:00
|
|
|
if (__is_large_section(sbi)) {
|
treewide: Use array_size in f2fs_kvzalloc()
The f2fs_kvzalloc() function has no 2-factor argument form, so
multiplication factors need to be wrapped in array_size(). This patch
replaces cases of:
f2fs_kvzalloc(handle, a * b, gfp)
with:
f2fs_kvzalloc(handle, array_size(a, b), gfp)
as well as handling cases of:
f2fs_kvzalloc(handle, a * b * c, gfp)
with:
f2fs_kvzalloc(handle, array3_size(a, b, c), gfp)
This does, however, attempt to ignore constant size factors like:
f2fs_kvzalloc(handle, 4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
expression HANDLE;
type TYPE;
expression THING, E;
@@
(
f2fs_kvzalloc(HANDLE,
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
f2fs_kvzalloc(HANDLE,
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression HANDLE;
expression COUNT;
typedef u8;
typedef __u8;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(char) * COUNT
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
expression HANDLE;
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT_ID
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT_ID
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
)
// 2-factor product, only identifiers.
@@
expression HANDLE;
identifier SIZE, COUNT;
@@
f2fs_kvzalloc(HANDLE,
- SIZE * COUNT
+ array_size(COUNT, SIZE)
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression HANDLE;
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression HANDLE;
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
expression HANDLE;
identifier STRIDE, SIZE, COUNT;
@@
(
f2fs_kvzalloc(HANDLE,
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression HANDLE;
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
f2fs_kvzalloc(HANDLE, C1 * C2 * C3, ...)
|
f2fs_kvzalloc(HANDLE,
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants.
@@
expression HANDLE;
expression E1, E2;
constant C1, C2;
@@
(
f2fs_kvzalloc(HANDLE, C1 * C2, ...)
|
f2fs_kvzalloc(HANDLE,
- E1 * E2
+ array_size(E1, E2)
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13 05:28:35 +08:00
|
|
|
sit_i->sec_entries =
|
|
|
|
f2fs_kvzalloc(sbi, array_size(sizeof(struct sec_entry),
|
|
|
|
MAIN_SECS(sbi)),
|
|
|
|
GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sit_i->sec_entries)
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* get information related with SIT */
|
|
|
|
sit_segs = le32_to_cpu(raw_super->segment_count_sit) >> 1;
|
|
|
|
|
|
|
|
/* setup SIT bitmap from ckeckpoint pack */
|
2019-08-07 21:40:32 +08:00
|
|
|
sit_bitmap_size = __bitmap_size(sbi, SIT_BITMAP);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
src_bitmap = __bitmap_ptr(sbi, SIT_BITMAP);
|
|
|
|
|
2019-08-07 21:40:32 +08:00
|
|
|
sit_i->sit_bitmap = kmemdup(src_bitmap, sit_bitmap_size, GFP_KERNEL);
|
2017-01-07 18:52:34 +08:00
|
|
|
if (!sit_i->sit_bitmap)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
|
2017-01-07 18:52:34 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
2019-08-07 21:40:32 +08:00
|
|
|
sit_i->sit_bitmap_mir = kmemdup(src_bitmap,
|
|
|
|
sit_bitmap_size, GFP_KERNEL);
|
2017-01-07 18:52:34 +08:00
|
|
|
if (!sit_i->sit_bitmap_mir)
|
|
|
|
return -ENOMEM;
|
2019-08-07 21:40:32 +08:00
|
|
|
|
|
|
|
sit_i->invalid_segmap = f2fs_kvzalloc(sbi,
|
|
|
|
main_bitmap_size, GFP_KERNEL);
|
|
|
|
if (!sit_i->invalid_segmap)
|
|
|
|
return -ENOMEM;
|
2017-01-07 18:52:34 +08:00
|
|
|
#endif
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
sit_i->sit_base_addr = le32_to_cpu(raw_super->sit_blkaddr);
|
|
|
|
sit_i->sit_blocks = sit_segs << sbi->log_blocks_per_seg;
|
2016-11-15 10:20:10 +08:00
|
|
|
sit_i->written_valid_blocks = 0;
|
2019-08-07 21:40:32 +08:00
|
|
|
sit_i->bitmap_size = sit_bitmap_size;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
sit_i->dirty_sentries = 0;
|
|
|
|
sit_i->sents_per_block = SIT_ENTRY_PER_BLOCK;
|
|
|
|
sit_i->elapsed_time = le64_to_cpu(sbi->ckpt->elapsed_time);
|
2020-02-26 11:08:16 +08:00
|
|
|
sit_i->mounted_time = ktime_get_boottime_seconds();
|
2017-10-30 17:49:53 +08:00
|
|
|
init_rwsem(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int build_free_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct free_segmap_info *free_i;
|
|
|
|
unsigned int bitmap_size, sec_bitmap_size;
|
|
|
|
|
|
|
|
/* allocate memory for free segmap information */
|
2017-11-30 19:28:17 +08:00
|
|
|
free_i = f2fs_kzalloc(sbi, sizeof(struct free_segmap_info), GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!free_i)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
SM_I(sbi)->free_info = free_i;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
|
2017-11-30 19:28:18 +08:00
|
|
|
free_i->free_segmap = f2fs_kvmalloc(sbi, bitmap_size, GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!free_i->free_segmap)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
sec_bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
|
2017-11-30 19:28:18 +08:00
|
|
|
free_i->free_secmap = f2fs_kvmalloc(sbi, sec_bitmap_size, GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!free_i->free_secmap)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
/* set all segments as dirty temporarily */
|
|
|
|
memset(free_i->free_segmap, 0xff, bitmap_size);
|
|
|
|
memset(free_i->free_secmap, 0xff, sec_bitmap_size);
|
|
|
|
|
|
|
|
/* init free segmap information */
|
2014-09-24 02:23:01 +08:00
|
|
|
free_i->start_segno = GET_SEGNO_FROM_SEG0(sbi, MAIN_BLKADDR(sbi));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
free_i->free_segments = 0;
|
|
|
|
free_i->free_sections = 0;
|
2015-02-11 18:20:38 +08:00
|
|
|
spin_lock_init(&free_i->segmap_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int build_curseg(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2012-12-01 09:56:13 +08:00
|
|
|
struct curseg_info *array;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
int i;
|
|
|
|
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
array = f2fs_kzalloc(sbi, array_size(NR_CURSEG_TYPE,
|
|
|
|
sizeof(*array)), GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!array)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
SM_I(sbi)->curseg_array = array;
|
|
|
|
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
for (i = 0; i < NO_CHECK_TYPE; i++) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
mutex_init(&array[i].curseg_mutex);
|
2017-11-30 19:28:17 +08:00
|
|
|
array[i].sum_blk = f2fs_kzalloc(sbi, PAGE_SIZE, GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!array[i].sum_blk)
|
|
|
|
return -ENOMEM;
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
init_rwsem(&array[i].journal_rwsem);
|
2017-11-30 19:28:17 +08:00
|
|
|
array[i].journal = f2fs_kzalloc(sbi,
|
|
|
|
sizeof(struct f2fs_journal), GFP_KERNEL);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
if (!array[i].journal)
|
|
|
|
return -ENOMEM;
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
if (i < NR_PERSISTENT_LOG)
|
|
|
|
array[i].seg_type = CURSEG_HOT_DATA + i;
|
|
|
|
else if (i == CURSEG_COLD_DATA_PINNED)
|
|
|
|
array[i].seg_type = CURSEG_COLD_DATA;
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
else if (i == CURSEG_ALL_DATA_ATGC)
|
|
|
|
array[i].seg_type = CURSEG_COLD_DATA;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
array[i].segno = NULL_SEGNO;
|
|
|
|
array[i].next_blkoff = 0;
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
array[i].inited = false;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
return restore_curseg_summaries(sbi);
|
|
|
|
}
|
|
|
|
|
2017-12-20 11:16:34 +08:00
|
|
|
static int build_sit_entries(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
struct f2fs_journal *journal = curseg->journal;
|
2016-09-24 12:29:18 +08:00
|
|
|
struct seg_entry *se;
|
|
|
|
struct f2fs_sit_entry sit;
|
2013-11-22 09:09:59 +08:00
|
|
|
int sit_blk_cnt = SIT_BLK_CNT(sbi);
|
|
|
|
unsigned int i, start, end;
|
|
|
|
unsigned int readed, start_blk = 0;
|
2017-12-20 11:16:34 +08:00
|
|
|
int err = 0;
|
2022-05-06 09:33:06 +08:00
|
|
|
block_t sit_valid_blocks[2] = {0, 0};
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2013-11-22 09:09:59 +08:00
|
|
|
do {
|
2021-03-11 19:01:37 +08:00
|
|
|
readed = f2fs_ra_meta_pages(sbi, start_blk, BIO_MAX_VECS,
|
2016-10-19 02:07:45 +08:00
|
|
|
META_SIT, true);
|
2013-11-22 09:09:59 +08:00
|
|
|
|
|
|
|
start = start_blk * sit_i->sents_per_block;
|
|
|
|
end = (start_blk + readed) * sit_i->sents_per_block;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
for (; start < end && start < MAIN_SEGS(sbi); start++) {
|
2013-11-22 09:09:59 +08:00
|
|
|
struct f2fs_sit_block *sit_blk;
|
|
|
|
struct page *page;
|
|
|
|
|
2016-09-24 12:29:18 +08:00
|
|
|
se = &sit_i->sentries[start];
|
2013-11-22 09:09:59 +08:00
|
|
|
page = get_current_sit_page(sbi, start);
|
2018-09-18 08:36:06 +08:00
|
|
|
if (IS_ERR(page))
|
|
|
|
return PTR_ERR(page);
|
2013-11-22 09:09:59 +08:00
|
|
|
sit_blk = (struct f2fs_sit_block *)page_address(page);
|
|
|
|
sit = sit_blk->entries[SIT_ENTRY_OFFSET(sit_i, start)];
|
|
|
|
f2fs_put_page(page, 1);
|
2016-08-19 23:13:47 +08:00
|
|
|
|
2017-12-20 11:16:34 +08:00
|
|
|
err = check_block_count(sbi, start, &sit);
|
|
|
|
if (err)
|
|
|
|
return err;
|
2013-11-22 09:09:59 +08:00
|
|
|
seg_info_from_raw_sit(se, &sit);
|
2022-05-06 09:33:06 +08:00
|
|
|
|
2022-07-27 22:51:05 +08:00
|
|
|
if (se->type >= NR_PERSISTENT_LOG) {
|
|
|
|
f2fs_err(sbi, "Invalid segment type: %u, segno: %u",
|
|
|
|
se->type, start);
|
2022-09-28 23:38:54 +08:00
|
|
|
f2fs_handle_error(sbi,
|
|
|
|
ERROR_INCONSISTENT_SUM_TYPE);
|
2022-07-27 22:51:05 +08:00
|
|
|
return -EFSCORRUPTED;
|
|
|
|
}
|
|
|
|
|
2022-05-06 09:33:06 +08:00
|
|
|
sit_valid_blocks[SE_PAGETYPE(se)] += se->valid_blocks;
|
2015-05-01 13:37:50 +08:00
|
|
|
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
if (f2fs_block_unit_discard(sbi)) {
|
|
|
|
/* build discard map only one time */
|
|
|
|
if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
|
|
|
|
memset(se->discard_map, 0xff,
|
|
|
|
SIT_VBLOCK_MAP_SIZE);
|
|
|
|
} else {
|
|
|
|
memcpy(se->discard_map,
|
|
|
|
se->cur_valid_map,
|
|
|
|
SIT_VBLOCK_MAP_SIZE);
|
|
|
|
sbi->discard_blks +=
|
|
|
|
sbi->blocks_per_seg -
|
|
|
|
se->valid_blocks;
|
|
|
|
}
|
2016-08-03 01:56:40 +08:00
|
|
|
}
|
2015-05-01 13:37:50 +08:00
|
|
|
|
2018-10-24 18:37:26 +08:00
|
|
|
if (__is_large_section(sbi))
|
2016-08-19 23:13:47 +08:00
|
|
|
get_sec_entry(sbi, start)->valid_blocks +=
|
|
|
|
se->valid_blocks;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
2013-11-22 09:09:59 +08:00
|
|
|
start_blk += readed;
|
|
|
|
} while (start_blk < sit_blk_cnt);
|
2016-08-19 23:13:47 +08:00
|
|
|
|
|
|
|
down_read(&curseg->journal_rwsem);
|
|
|
|
for (i = 0; i < sits_in_cursum(journal); i++) {
|
|
|
|
unsigned int old_valid_blocks;
|
|
|
|
|
|
|
|
start = le32_to_cpu(segno_in_journal(journal, i));
|
2018-04-25 05:44:16 +08:00
|
|
|
if (start >= MAIN_SEGS(sbi)) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_err(sbi, "Wrong journal entry on segno %u",
|
|
|
|
start);
|
2019-06-20 11:36:14 +08:00
|
|
|
err = -EFSCORRUPTED;
|
2022-09-28 23:38:54 +08:00
|
|
|
f2fs_handle_error(sbi, ERROR_CORRUPTED_JOURNAL);
|
2018-04-25 05:44:16 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2016-08-19 23:13:47 +08:00
|
|
|
se = &sit_i->sentries[start];
|
|
|
|
sit = sit_in_journal(journal, i);
|
|
|
|
|
|
|
|
old_valid_blocks = se->valid_blocks;
|
2022-05-06 09:33:06 +08:00
|
|
|
|
|
|
|
sit_valid_blocks[SE_PAGETYPE(se)] -= old_valid_blocks;
|
2016-08-19 23:13:47 +08:00
|
|
|
|
2017-12-20 11:16:34 +08:00
|
|
|
err = check_block_count(sbi, start, &sit);
|
|
|
|
if (err)
|
|
|
|
break;
|
2016-08-19 23:13:47 +08:00
|
|
|
seg_info_from_raw_sit(se, &sit);
|
2022-05-06 09:33:06 +08:00
|
|
|
|
2022-07-27 22:51:05 +08:00
|
|
|
if (se->type >= NR_PERSISTENT_LOG) {
|
|
|
|
f2fs_err(sbi, "Invalid segment type: %u, segno: %u",
|
|
|
|
se->type, start);
|
|
|
|
err = -EFSCORRUPTED;
|
2022-09-28 23:38:54 +08:00
|
|
|
f2fs_handle_error(sbi, ERROR_INCONSISTENT_SUM_TYPE);
|
2022-07-27 22:51:05 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2022-05-06 09:33:06 +08:00
|
|
|
sit_valid_blocks[SE_PAGETYPE(se)] += se->valid_blocks;
|
2016-08-19 23:13:47 +08:00
|
|
|
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
if (f2fs_block_unit_discard(sbi)) {
|
|
|
|
if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
|
|
|
|
memset(se->discard_map, 0xff, SIT_VBLOCK_MAP_SIZE);
|
|
|
|
} else {
|
|
|
|
memcpy(se->discard_map, se->cur_valid_map,
|
|
|
|
SIT_VBLOCK_MAP_SIZE);
|
|
|
|
sbi->discard_blks += old_valid_blocks;
|
|
|
|
sbi->discard_blks -= se->valid_blocks;
|
|
|
|
}
|
2016-08-19 23:13:47 +08:00
|
|
|
}
|
|
|
|
|
2018-10-24 18:37:26 +08:00
|
|
|
if (__is_large_section(sbi)) {
|
2016-08-19 23:13:47 +08:00
|
|
|
get_sec_entry(sbi, start)->valid_blocks +=
|
2018-04-25 19:38:17 +08:00
|
|
|
se->valid_blocks;
|
|
|
|
get_sec_entry(sbi, start)->valid_blocks -=
|
|
|
|
old_valid_blocks;
|
|
|
|
}
|
2016-08-19 23:13:47 +08:00
|
|
|
}
|
|
|
|
up_read(&curseg->journal_rwsem);
|
2018-04-25 11:34:05 +08:00
|
|
|
|
2022-05-06 09:33:06 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (sit_valid_blocks[NODE] != valid_node_count(sbi)) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_err(sbi, "SIT is corrupted node# %u vs %u",
|
2022-05-06 09:33:06 +08:00
|
|
|
sit_valid_blocks[NODE], valid_node_count(sbi));
|
2022-09-28 23:38:54 +08:00
|
|
|
f2fs_handle_error(sbi, ERROR_INCONSISTENT_NODE_COUNT);
|
2022-05-06 09:33:06 +08:00
|
|
|
return -EFSCORRUPTED;
|
2018-04-25 11:34:05 +08:00
|
|
|
}
|
|
|
|
|
2022-05-06 09:33:06 +08:00
|
|
|
if (sit_valid_blocks[DATA] + sit_valid_blocks[NODE] >
|
|
|
|
valid_user_blocks(sbi)) {
|
|
|
|
f2fs_err(sbi, "SIT is corrupted data# %u %u vs %u",
|
|
|
|
sit_valid_blocks[DATA], sit_valid_blocks[NODE],
|
|
|
|
valid_user_blocks(sbi));
|
2022-09-28 23:38:54 +08:00
|
|
|
f2fs_handle_error(sbi, ERROR_INCONSISTENT_BLOCK_COUNT);
|
2022-05-06 09:33:06 +08:00
|
|
|
return -EFSCORRUPTED;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void init_free_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
unsigned int start;
|
|
|
|
int type;
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
struct seg_entry *sentry;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
for (start = 0; start < MAIN_SEGS(sbi); start++) {
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
if (f2fs_usable_blks_in_seg(sbi, start) == 0)
|
|
|
|
continue;
|
|
|
|
sentry = get_seg_entry(sbi, start);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sentry->valid_blocks)
|
|
|
|
__set_free(sbi, start);
|
2016-11-15 10:20:10 +08:00
|
|
|
else
|
|
|
|
SIT_I(sbi)->written_valid_blocks +=
|
|
|
|
sentry->valid_blocks;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* set use the current segments */
|
|
|
|
for (type = CURSEG_HOT_DATA; type <= CURSEG_COLD_NODE; type++) {
|
|
|
|
struct curseg_info *curseg_t = CURSEG_I(sbi, type);
|
2021-04-06 09:47:35 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
__set_test_and_inuse(sbi, curseg_t->segno);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void init_dirty_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
struct free_segmap_info *free_i = FREE_I(sbi);
|
2020-06-18 12:37:10 +08:00
|
|
|
unsigned int segno = 0, offset = 0, secno;
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
block_t valid_blocks, usable_blks_in_seg;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2013-06-16 08:49:11 +08:00
|
|
|
while (1) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/* find dirty segment based on free segmap */
|
2014-09-24 02:23:01 +08:00
|
|
|
segno = find_next_inuse(free_i, MAIN_SEGS(sbi), offset);
|
|
|
|
if (segno >= MAIN_SEGS(sbi))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
break;
|
|
|
|
offset = segno + 1;
|
2017-04-08 05:33:22 +08:00
|
|
|
valid_blocks = get_valid_blocks(sbi, segno, false);
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
|
|
|
|
if (valid_blocks == usable_blks_in_seg || !valid_blocks)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
continue;
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
if (valid_blocks > usable_blks_in_seg) {
|
2014-09-03 07:24:11 +08:00
|
|
|
f2fs_bug_on(sbi, 1);
|
|
|
|
continue;
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
|
|
|
__locate_dirty_segment(sbi, segno, DIRTY);
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
}
|
2020-06-18 12:37:10 +08:00
|
|
|
|
|
|
|
if (!__is_large_section(sbi))
|
|
|
|
return;
|
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
2020-12-01 15:45:47 +08:00
|
|
|
for (segno = 0; segno < MAIN_SEGS(sbi); segno += sbi->segs_per_sec) {
|
2020-06-18 12:37:10 +08:00
|
|
|
valid_blocks = get_valid_blocks(sbi, segno, true);
|
|
|
|
secno = GET_SEC_FROM_SEG(sbi, segno);
|
|
|
|
|
2022-06-29 02:03:57 +08:00
|
|
|
if (!valid_blocks || valid_blocks == CAP_BLKS_PER_SEC(sbi))
|
2020-06-18 12:37:10 +08:00
|
|
|
continue;
|
|
|
|
if (IS_CURSEC(sbi, secno))
|
|
|
|
continue;
|
|
|
|
set_bit(secno, dirty_i->dirty_secmap);
|
|
|
|
}
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2013-03-31 12:26:03 +08:00
|
|
|
static int init_victim_secmap(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2014-09-24 02:23:01 +08:00
|
|
|
unsigned int bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2017-11-30 19:28:18 +08:00
|
|
|
dirty_i->victim_secmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL);
|
2013-03-31 12:26:03 +08:00
|
|
|
if (!dirty_i->victim_secmap)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return -ENOMEM;
|
2022-05-06 18:30:31 +08:00
|
|
|
|
|
|
|
dirty_i->pinned_secmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL);
|
|
|
|
if (!dirty_i->pinned_secmap)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
dirty_i->pinned_secmap_cnt = 0;
|
|
|
|
dirty_i->enable_pin_section = true;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int build_dirty_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i;
|
|
|
|
unsigned int bitmap_size, i;
|
|
|
|
|
|
|
|
/* allocate memory for dirty segments list information */
|
2017-11-30 19:28:17 +08:00
|
|
|
dirty_i = f2fs_kzalloc(sbi, sizeof(struct dirty_seglist_info),
|
|
|
|
GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!dirty_i)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
SM_I(sbi)->dirty_info = dirty_i;
|
|
|
|
mutex_init(&dirty_i->seglist_lock);
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
for (i = 0; i < NR_DIRTY_TYPE; i++) {
|
2017-11-30 19:28:18 +08:00
|
|
|
dirty_i->dirty_segmap[i] = f2fs_kvzalloc(sbi, bitmap_size,
|
|
|
|
GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!dirty_i->dirty_segmap[i])
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
2020-06-18 12:37:10 +08:00
|
|
|
if (__is_large_section(sbi)) {
|
|
|
|
bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
|
|
|
|
dirty_i->dirty_secmap = f2fs_kvzalloc(sbi,
|
|
|
|
bitmap_size, GFP_KERNEL);
|
|
|
|
if (!dirty_i->dirty_secmap)
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
init_dirty_segmap(sbi);
|
2013-03-31 12:26:03 +08:00
|
|
|
return init_victim_secmap(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2019-05-25 23:07:25 +08:00
|
|
|
static int sanity_check_curseg(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* In LFS/SSR curseg, .next_blkoff should point to an unused blkaddr;
|
|
|
|
* In LFS curseg, all blkaddr after .next_blkoff should be unused.
|
|
|
|
*/
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
for (i = 0; i < NR_PERSISTENT_LOG; i++) {
|
2019-05-25 23:07:25 +08:00
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, i);
|
|
|
|
struct seg_entry *se = get_seg_entry(sbi, curseg->segno);
|
|
|
|
unsigned int blkofs = curseg->next_blkoff;
|
|
|
|
|
2021-05-21 16:32:53 +08:00
|
|
|
if (f2fs_sb_has_readonly(sbi) &&
|
|
|
|
i != CURSEG_HOT_DATA && i != CURSEG_HOT_NODE)
|
|
|
|
continue;
|
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
sanity_check_seg_type(sbi, curseg->seg_type);
|
2022-03-04 09:49:13 +08:00
|
|
|
|
|
|
|
if (curseg->alloc_type != LFS && curseg->alloc_type != SSR) {
|
|
|
|
f2fs_err(sbi,
|
|
|
|
"Current segment has invalid alloc_type:%d",
|
|
|
|
curseg->alloc_type);
|
2022-09-28 23:38:54 +08:00
|
|
|
f2fs_handle_error(sbi, ERROR_INVALID_CURSEG);
|
2022-03-04 09:49:13 +08:00
|
|
|
return -EFSCORRUPTED;
|
|
|
|
}
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
|
2019-05-25 23:07:25 +08:00
|
|
|
if (f2fs_test_bit(blkofs, se->cur_valid_map))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (curseg->alloc_type == SSR)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
for (blkofs += 1; blkofs < sbi->blocks_per_seg; blkofs++) {
|
|
|
|
if (!f2fs_test_bit(blkofs, se->cur_valid_map))
|
|
|
|
continue;
|
|
|
|
out:
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_err(sbi,
|
|
|
|
"Current segment's next free block offset is inconsistent with bitmap, logtype:%u, segno:%u, type:%u, next_blkoff:%u, blkofs:%u",
|
|
|
|
i, curseg->segno, curseg->alloc_type,
|
|
|
|
curseg->next_blkoff, blkofs);
|
2022-09-28 23:38:54 +08:00
|
|
|
f2fs_handle_error(sbi, ERROR_INVALID_CURSEG);
|
2019-06-20 11:36:14 +08:00
|
|
|
return -EFSCORRUPTED;
|
2019-05-25 23:07:25 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-12-09 18:44:44 +08:00
|
|
|
#ifdef CONFIG_BLK_DEV_ZONED
|
|
|
|
|
2019-12-09 18:44:45 +08:00
|
|
|
static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
|
|
|
|
struct f2fs_dev_info *fdev,
|
|
|
|
struct blk_zone *zone)
|
|
|
|
{
|
|
|
|
unsigned int wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
|
|
|
|
block_t zone_block, wp_block, last_valid_block;
|
|
|
|
int i, s, b, ret;
|
|
|
|
struct seg_entry *se;
|
|
|
|
|
|
|
|
if (zone->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
|
|
|
|
return 0;
|
|
|
|
|
2023-05-23 20:35:21 +08:00
|
|
|
wp_block = fdev->start_blk + (zone->wp >> sbi->log_sectors_per_block);
|
2019-12-09 18:44:45 +08:00
|
|
|
wp_segno = GET_SEGNO(sbi, wp_block);
|
|
|
|
wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
|
2023-05-23 20:35:21 +08:00
|
|
|
zone_block = fdev->start_blk + (zone->start >>
|
|
|
|
sbi->log_sectors_per_block);
|
2019-12-09 18:44:45 +08:00
|
|
|
zone_segno = GET_SEGNO(sbi, zone_block);
|
|
|
|
zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
|
|
|
|
|
|
|
|
if (zone_segno >= MAIN_SEGS(sbi))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Skip check of zones cursegs point to, since
|
|
|
|
* fix_curseg_write_pointer() checks them.
|
|
|
|
*/
|
|
|
|
for (i = 0; i < NO_CHECK_TYPE; i++)
|
|
|
|
if (zone_secno == GET_SEC_FROM_SEG(sbi,
|
|
|
|
CURSEG_I(sbi, i)->segno))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get last valid block of the zone.
|
|
|
|
*/
|
|
|
|
last_valid_block = zone_block - 1;
|
|
|
|
for (s = sbi->segs_per_sec - 1; s >= 0; s--) {
|
|
|
|
segno = zone_segno + s;
|
|
|
|
se = get_seg_entry(sbi, segno);
|
|
|
|
for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
|
|
|
|
if (f2fs_test_bit(b, se->cur_valid_map)) {
|
|
|
|
last_valid_block = START_BLOCK(sbi, segno) + b;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (last_valid_block >= zone_block)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2023-06-13 07:32:03 +08:00
|
|
|
/*
|
|
|
|
* The write pointer matches with the valid blocks or
|
|
|
|
* already points to the end of the zone.
|
|
|
|
*/
|
|
|
|
if ((last_valid_block + 1 == wp_block) ||
|
|
|
|
(zone->wp == zone->start + zone->len))
|
2019-12-09 18:44:45 +08:00
|
|
|
return 0;
|
|
|
|
|
2023-05-06 04:40:00 +08:00
|
|
|
if (last_valid_block + 1 == zone_block) {
|
|
|
|
/*
|
|
|
|
* If there is no valid block in the zone and if write pointer
|
|
|
|
* is not at zone start, reset the write pointer.
|
|
|
|
*/
|
2019-12-09 18:44:45 +08:00
|
|
|
f2fs_notice(sbi,
|
|
|
|
"Zone without valid block has non-zero write "
|
|
|
|
"pointer. Reset the write pointer: wp[0x%x,0x%x]",
|
|
|
|
wp_segno, wp_blkoff);
|
|
|
|
ret = __f2fs_issue_discard_zone(sbi, fdev->bdev, zone_block,
|
2023-05-23 20:35:21 +08:00
|
|
|
zone->len >> sbi->log_sectors_per_block);
|
2023-05-06 04:40:00 +08:00
|
|
|
if (ret)
|
2019-12-09 18:44:45 +08:00
|
|
|
f2fs_err(sbi, "Discard zone failed: %s (errno=%d)",
|
|
|
|
fdev->path, ret);
|
2023-05-06 04:40:00 +08:00
|
|
|
|
|
|
|
return ret;
|
2019-12-09 18:44:45 +08:00
|
|
|
}
|
|
|
|
|
2023-05-06 04:40:00 +08:00
|
|
|
/*
|
|
|
|
* If there are valid blocks and the write pointer doesn't
|
|
|
|
* match with them, we need to report the inconsistency and
|
|
|
|
* fill the zone till the end to close the zone. This inconsistency
|
|
|
|
* does not cause write error because the zone will not be selected
|
|
|
|
* for write operation until it get discarded.
|
|
|
|
*/
|
|
|
|
f2fs_notice(sbi, "Valid blocks are not aligned with write pointer: "
|
|
|
|
"valid block[0x%x,0x%x] wp[0x%x,0x%x]",
|
|
|
|
GET_SEGNO(sbi, last_valid_block),
|
|
|
|
GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
|
|
|
|
wp_segno, wp_blkoff);
|
|
|
|
|
|
|
|
ret = blkdev_issue_zeroout(fdev->bdev, zone->wp,
|
|
|
|
zone->len - (zone->wp - zone->start),
|
|
|
|
GFP_NOFS, 0);
|
|
|
|
if (ret)
|
|
|
|
f2fs_err(sbi, "Fill up zone failed: %s (errno=%d)",
|
|
|
|
fdev->path, ret);
|
|
|
|
|
|
|
|
return ret;
|
2019-12-09 18:44:45 +08:00
|
|
|
}
|
|
|
|
|
2019-12-09 18:44:44 +08:00
|
|
|
static struct f2fs_dev_info *get_target_zoned_dev(struct f2fs_sb_info *sbi,
|
|
|
|
block_t zone_blkaddr)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < sbi->s_ndevs; i++) {
|
|
|
|
if (!bdev_is_zoned(FDEV(i).bdev))
|
|
|
|
continue;
|
|
|
|
if (sbi->s_ndevs == 1 || (FDEV(i).start_blk <= zone_blkaddr &&
|
|
|
|
zone_blkaddr <= FDEV(i).end_blk))
|
|
|
|
return &FDEV(i);
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int report_one_zone_cb(struct blk_zone *zone, unsigned int idx,
|
2021-04-06 09:47:35 +08:00
|
|
|
void *data)
|
|
|
|
{
|
2019-12-09 18:44:44 +08:00
|
|
|
memcpy(data, zone, sizeof(struct blk_zone));
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
|
|
|
struct curseg_info *cs = CURSEG_I(sbi, type);
|
|
|
|
struct f2fs_dev_info *zbd;
|
|
|
|
struct blk_zone zone;
|
|
|
|
unsigned int cs_section, wp_segno, wp_blkoff, wp_sector_off;
|
|
|
|
block_t cs_zone_block, wp_block;
|
|
|
|
sector_t zone_sector;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
|
|
|
|
cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
|
|
|
|
|
|
|
|
zbd = get_target_zoned_dev(sbi, cs_zone_block);
|
|
|
|
if (!zbd)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* report zone for the sector the curseg points to */
|
2023-05-23 20:35:21 +08:00
|
|
|
zone_sector = (sector_t)(cs_zone_block - zbd->start_blk) <<
|
|
|
|
sbi->log_sectors_per_block;
|
2019-12-09 18:44:44 +08:00
|
|
|
err = blkdev_report_zones(zbd->bdev, zone_sector, 1,
|
|
|
|
report_one_zone_cb, &zone);
|
|
|
|
if (err != 1) {
|
|
|
|
f2fs_err(sbi, "Report zone failed: %s errno=(%d)",
|
|
|
|
zbd->path, err);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ)
|
|
|
|
return 0;
|
|
|
|
|
2023-05-23 20:35:21 +08:00
|
|
|
wp_block = zbd->start_blk + (zone.wp >> sbi->log_sectors_per_block);
|
2019-12-09 18:44:44 +08:00
|
|
|
wp_segno = GET_SEGNO(sbi, wp_block);
|
|
|
|
wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
|
2023-05-23 20:35:21 +08:00
|
|
|
wp_sector_off = zone.wp & GENMASK(sbi->log_sectors_per_block - 1, 0);
|
2019-12-09 18:44:44 +08:00
|
|
|
|
|
|
|
if (cs->segno == wp_segno && cs->next_blkoff == wp_blkoff &&
|
|
|
|
wp_sector_off == 0)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
f2fs_notice(sbi, "Unaligned curseg[%d] with write pointer: "
|
|
|
|
"curseg[0x%x,0x%x] wp[0x%x,0x%x]",
|
|
|
|
type, cs->segno, cs->next_blkoff, wp_segno, wp_blkoff);
|
|
|
|
|
|
|
|
f2fs_notice(sbi, "Assign new section to curseg[%d]: "
|
|
|
|
"curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
|
2021-04-21 09:54:55 +08:00
|
|
|
|
|
|
|
f2fs_allocate_new_section(sbi, type, true);
|
2019-12-09 18:44:44 +08:00
|
|
|
|
2019-12-09 18:44:45 +08:00
|
|
|
/* check consistency of the zone curseg pointed to */
|
|
|
|
if (check_zone_write_pointer(sbi, zbd, &zone))
|
|
|
|
return -EIO;
|
|
|
|
|
2019-12-09 18:44:44 +08:00
|
|
|
/* check newly assigned zone */
|
|
|
|
cs_section = GET_SEC_FROM_SEG(sbi, cs->segno);
|
|
|
|
cs_zone_block = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, cs_section));
|
|
|
|
|
|
|
|
zbd = get_target_zoned_dev(sbi, cs_zone_block);
|
|
|
|
if (!zbd)
|
|
|
|
return 0;
|
|
|
|
|
2023-05-23 20:35:21 +08:00
|
|
|
zone_sector = (sector_t)(cs_zone_block - zbd->start_blk) <<
|
|
|
|
sbi->log_sectors_per_block;
|
2019-12-09 18:44:44 +08:00
|
|
|
err = blkdev_report_zones(zbd->bdev, zone_sector, 1,
|
|
|
|
report_one_zone_cb, &zone);
|
|
|
|
if (err != 1) {
|
|
|
|
f2fs_err(sbi, "Report zone failed: %s errno=(%d)",
|
|
|
|
zbd->path, err);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (zone.wp != zone.start) {
|
|
|
|
f2fs_notice(sbi,
|
|
|
|
"New zone for curseg[%d] is not yet discarded. "
|
|
|
|
"Reset the zone: curseg[0x%x,0x%x]",
|
|
|
|
type, cs->segno, cs->next_blkoff);
|
2023-04-07 06:11:04 +08:00
|
|
|
err = __f2fs_issue_discard_zone(sbi, zbd->bdev, cs_zone_block,
|
2023-05-23 20:35:21 +08:00
|
|
|
zone.len >> sbi->log_sectors_per_block);
|
2019-12-09 18:44:44 +08:00
|
|
|
if (err) {
|
|
|
|
f2fs_err(sbi, "Discard zone failed: %s (errno=%d)",
|
|
|
|
zbd->path, err);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
int i, ret;
|
|
|
|
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
for (i = 0; i < NR_PERSISTENT_LOG; i++) {
|
2019-12-09 18:44:44 +08:00
|
|
|
ret = fix_curseg_write_pointer(sbi, i);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2019-12-09 18:44:45 +08:00
|
|
|
|
|
|
|
struct check_zone_write_pointer_args {
|
|
|
|
struct f2fs_sb_info *sbi;
|
|
|
|
struct f2fs_dev_info *fdev;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int check_zone_write_pointer_cb(struct blk_zone *zone, unsigned int idx,
|
2021-04-06 09:47:35 +08:00
|
|
|
void *data)
|
|
|
|
{
|
2019-12-09 18:44:45 +08:00
|
|
|
struct check_zone_write_pointer_args *args;
|
2021-04-06 09:47:35 +08:00
|
|
|
|
2019-12-09 18:44:45 +08:00
|
|
|
args = (struct check_zone_write_pointer_args *)data;
|
|
|
|
|
|
|
|
return check_zone_write_pointer(args->sbi, args->fdev, zone);
|
|
|
|
}
|
|
|
|
|
|
|
|
int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
int i, ret;
|
|
|
|
struct check_zone_write_pointer_args args;
|
|
|
|
|
|
|
|
for (i = 0; i < sbi->s_ndevs; i++) {
|
|
|
|
if (!bdev_is_zoned(FDEV(i).bdev))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
args.sbi = sbi;
|
|
|
|
args.fdev = &FDEV(i);
|
|
|
|
ret = blkdev_report_zones(FDEV(i).bdev, 0, BLK_ALL_ZONES,
|
|
|
|
check_zone_write_pointer_cb, &args);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Return the number of usable blocks in a segment. The number of blocks
|
|
|
|
* returned is always equal to the number of blocks in a segment for
|
|
|
|
* segments fully contained within a sequential zone capacity or a
|
|
|
|
* conventional zone. For segments partially contained in a sequential
|
|
|
|
* zone capacity, the number of usable blocks up to the zone capacity
|
|
|
|
* is returned. 0 is returned in all other cases.
|
|
|
|
*/
|
|
|
|
static inline unsigned int f2fs_usable_zone_blks_in_seg(
|
|
|
|
struct f2fs_sb_info *sbi, unsigned int segno)
|
|
|
|
{
|
|
|
|
block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr;
|
2023-03-22 06:58:04 +08:00
|
|
|
unsigned int secno;
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
|
2022-06-29 01:57:24 +08:00
|
|
|
if (!sbi->unusable_blocks_per_sec)
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
return sbi->blocks_per_seg;
|
|
|
|
|
2023-03-22 06:58:04 +08:00
|
|
|
secno = GET_SEC_FROM_SEG(sbi, segno);
|
|
|
|
seg_start = START_BLOCK(sbi, segno);
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno));
|
2022-06-29 01:57:24 +08:00
|
|
|
sec_cap_blkaddr = sec_start_blkaddr + CAP_BLKS_PER_SEC(sbi);
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If segment starts before zone capacity and spans beyond
|
|
|
|
* zone capacity, then usable blocks are from seg start to
|
|
|
|
* zone capacity. If the segment starts after the zone capacity,
|
|
|
|
* then there are no usable blocks.
|
|
|
|
*/
|
|
|
|
if (seg_start >= sec_cap_blkaddr)
|
|
|
|
return 0;
|
|
|
|
if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr)
|
|
|
|
return sec_cap_blkaddr - seg_start;
|
|
|
|
|
|
|
|
return sbi->blocks_per_seg;
|
|
|
|
}
|
2019-12-09 18:44:44 +08:00
|
|
|
#else
|
|
|
|
int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
2019-12-09 18:44:45 +08:00
|
|
|
|
|
|
|
int f2fs_check_write_pointer(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
|
|
|
|
static inline unsigned int f2fs_usable_zone_blks_in_seg(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int segno)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-12-09 18:44:44 +08:00
|
|
|
#endif
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int segno)
|
|
|
|
{
|
|
|
|
if (f2fs_sb_has_blkzoned(sbi))
|
|
|
|
return f2fs_usable_zone_blks_in_seg(sbi, segno);
|
|
|
|
|
|
|
|
return sbi->blocks_per_seg;
|
|
|
|
}
|
|
|
|
|
|
|
|
unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int segno)
|
|
|
|
{
|
|
|
|
if (f2fs_sb_has_blkzoned(sbi))
|
2023-03-22 06:58:04 +08:00
|
|
|
return CAP_SEGS_PER_SEC(sbi);
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
|
|
|
|
return sbi->segs_per_sec;
|
|
|
|
}
|
2019-12-09 18:44:44 +08:00
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Update min, max modified time for cost-benefit GC algorithm
|
|
|
|
*/
|
|
|
|
static void init_min_max_mtime(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
unsigned int segno;
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2018-05-15 18:59:55 +08:00
|
|
|
sit_i->min_mtime = ULLONG_MAX;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
for (segno = 0; segno < MAIN_SEGS(sbi); segno += sbi->segs_per_sec) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
unsigned int i;
|
|
|
|
unsigned long long mtime = 0;
|
|
|
|
|
|
|
|
for (i = 0; i < sbi->segs_per_sec; i++)
|
|
|
|
mtime += get_seg_entry(sbi, segno + i)->mtime;
|
|
|
|
|
|
|
|
mtime = div_u64(mtime, sbi->segs_per_sec);
|
|
|
|
|
|
|
|
if (sit_i->min_mtime > mtime)
|
|
|
|
sit_i->min_mtime = mtime;
|
|
|
|
}
|
2018-06-04 23:20:17 +08:00
|
|
|
sit_i->max_mtime = get_mtime(sbi, false);
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
sit_i->dirty_max_mtime = 0;
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_build_segment_manager(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
|
|
|
|
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
|
2012-12-01 09:56:13 +08:00
|
|
|
struct f2fs_sm_info *sm_info;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
int err;
|
|
|
|
|
2017-11-30 19:28:17 +08:00
|
|
|
sm_info = f2fs_kzalloc(sbi, sizeof(struct f2fs_sm_info), GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sm_info)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
/* init sm info */
|
|
|
|
sbi->sm_info = sm_info;
|
|
|
|
sm_info->seg0_blkaddr = le32_to_cpu(raw_super->segment0_blkaddr);
|
|
|
|
sm_info->main_blkaddr = le32_to_cpu(raw_super->main_blkaddr);
|
|
|
|
sm_info->segment_count = le32_to_cpu(raw_super->segment_count);
|
|
|
|
sm_info->reserved_segments = le32_to_cpu(ckpt->rsvd_segment_count);
|
|
|
|
sm_info->ovp_segments = le32_to_cpu(ckpt->overprov_segment_count);
|
|
|
|
sm_info->main_segments = le32_to_cpu(raw_super->segment_count_main);
|
|
|
|
sm_info->ssa_blkaddr = le32_to_cpu(raw_super->ssa_blkaddr);
|
2014-03-19 13:17:21 +08:00
|
|
|
sm_info->rec_prefree_segments = sm_info->main_segments *
|
|
|
|
DEF_RECLAIM_PREFREE_SEGMENTS / 100;
|
2016-07-14 09:23:35 +08:00
|
|
|
if (sm_info->rec_prefree_segments > DEF_MAX_RECLAIM_PREFREE_SEGMENTS)
|
|
|
|
sm_info->rec_prefree_segments = DEF_MAX_RECLAIM_PREFREE_SEGMENTS;
|
|
|
|
|
2020-02-14 17:44:12 +08:00
|
|
|
if (!f2fs_lfs_mode(sbi))
|
2022-11-19 03:18:39 +08:00
|
|
|
sm_info->ipu_policy = BIT(F2FS_IPU_FSYNC);
|
2013-11-07 12:13:42 +08:00
|
|
|
sm_info->min_ipu_util = DEF_MIN_IPU_UTIL;
|
2014-09-11 07:53:02 +08:00
|
|
|
sm_info->min_fsync_blocks = DEF_MIN_FSYNC_BLOCKS;
|
2021-07-31 11:26:46 +08:00
|
|
|
sm_info->min_seq_blocks = sbi->blocks_per_seg;
|
2017-03-25 08:05:13 +08:00
|
|
|
sm_info->min_hot_blocks = DEF_MIN_HOT_BLOCKS;
|
2017-10-28 16:52:33 +08:00
|
|
|
sm_info->min_ssr_sections = reserved_sections(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
INIT_LIST_HEAD(&sm_info->sit_entry_set);
|
|
|
|
|
2022-01-08 04:48:44 +08:00
|
|
|
init_f2fs_rwsem(&sm_info->curseg_lock);
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
|
2022-12-30 23:43:32 +08:00
|
|
|
err = f2fs_create_flush_cmd_control(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2017-01-12 06:40:24 +08:00
|
|
|
err = create_discard_cmd_control(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
err = build_sit_info(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
err = build_free_segmap(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
err = build_curseg(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
/* reinit free segmap based on SIT */
|
2017-12-20 11:16:34 +08:00
|
|
|
err = build_sit_entries(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
init_free_segmap(sbi);
|
|
|
|
err = build_dirty_segmap(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
2019-05-25 23:07:25 +08:00
|
|
|
err = sanity_check_curseg(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
init_min_max_mtime(sbi);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void discard_dirty_segmap(struct f2fs_sb_info *sbi,
|
|
|
|
enum dirty_type dirty_type)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
2015-09-23 04:50:47 +08:00
|
|
|
kvfree(dirty_i->dirty_segmap[dirty_type]);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
dirty_i->nr_dirty[dirty_type] = 0;
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
}
|
|
|
|
|
2013-03-31 12:26:03 +08:00
|
|
|
static void destroy_victim_secmap(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2021-04-06 09:47:35 +08:00
|
|
|
|
2022-05-06 18:30:31 +08:00
|
|
|
kvfree(dirty_i->pinned_secmap);
|
2015-09-23 04:50:47 +08:00
|
|
|
kvfree(dirty_i->victim_secmap);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void destroy_dirty_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!dirty_i)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* discard pre-free/dirty segments list */
|
|
|
|
for (i = 0; i < NR_DIRTY_TYPE; i++)
|
|
|
|
discard_dirty_segmap(sbi, i);
|
|
|
|
|
2020-06-18 12:37:10 +08:00
|
|
|
if (__is_large_section(sbi)) {
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
|
|
|
kvfree(dirty_i->dirty_secmap);
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
}
|
|
|
|
|
2013-03-31 12:26:03 +08:00
|
|
|
destroy_victim_secmap(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
SM_I(sbi)->dirty_info = NULL;
|
2020-09-14 16:47:00 +08:00
|
|
|
kfree(dirty_i);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void destroy_curseg(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct curseg_info *array = SM_I(sbi)->curseg_array;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!array)
|
|
|
|
return;
|
|
|
|
SM_I(sbi)->curseg_array = NULL;
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
for (i = 0; i < NR_CURSEG_TYPE; i++) {
|
2020-09-14 16:47:00 +08:00
|
|
|
kfree(array[i].sum_blk);
|
|
|
|
kfree(array[i].journal);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
}
|
2020-09-14 16:47:00 +08:00
|
|
|
kfree(array);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void destroy_free_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct free_segmap_info *free_i = SM_I(sbi)->free_info;
|
2021-04-06 09:47:35 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!free_i)
|
|
|
|
return;
|
|
|
|
SM_I(sbi)->free_info = NULL;
|
2015-09-23 04:50:47 +08:00
|
|
|
kvfree(free_i->free_segmap);
|
|
|
|
kvfree(free_i->free_secmap);
|
2020-09-14 16:47:00 +08:00
|
|
|
kfree(free_i);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void destroy_sit_info(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
|
|
|
|
if (!sit_i)
|
|
|
|
return;
|
|
|
|
|
2019-07-26 15:41:20 +08:00
|
|
|
if (sit_i->sentries)
|
|
|
|
kvfree(sit_i->bitmap);
|
2020-09-14 16:47:00 +08:00
|
|
|
kfree(sit_i->tmp_map);
|
2015-02-11 08:44:29 +08:00
|
|
|
|
2015-09-23 04:50:47 +08:00
|
|
|
kvfree(sit_i->sentries);
|
|
|
|
kvfree(sit_i->sec_entries);
|
|
|
|
kvfree(sit_i->dirty_sentries_bitmap);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
SM_I(sbi)->sit_info = NULL;
|
2018-12-14 10:38:33 +08:00
|
|
|
kvfree(sit_i->sit_bitmap);
|
2017-01-07 18:52:34 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
2018-12-14 10:38:33 +08:00
|
|
|
kvfree(sit_i->sit_bitmap_mir);
|
2019-08-07 21:40:32 +08:00
|
|
|
kvfree(sit_i->invalid_segmap);
|
2017-01-07 18:52:34 +08:00
|
|
|
#endif
|
2020-09-14 16:47:00 +08:00
|
|
|
kfree(sit_i);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct f2fs_sm_info *sm_info = SM_I(sbi);
|
2014-04-27 14:21:21 +08:00
|
|
|
|
2013-11-06 09:12:04 +08:00
|
|
|
if (!sm_info)
|
|
|
|
return;
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_destroy_flush_cmd_control(sbi, true);
|
2017-03-27 18:14:04 +08:00
|
|
|
destroy_discard_cmd_control(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
destroy_dirty_segmap(sbi);
|
|
|
|
destroy_curseg(sbi);
|
|
|
|
destroy_free_segmap(sbi);
|
|
|
|
destroy_sit_info(sbi);
|
|
|
|
sbi->sm_info = NULL;
|
2020-09-14 16:47:00 +08:00
|
|
|
kfree(sm_info);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
2013-11-15 12:55:58 +08:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int __init f2fs_create_segment_manager_caches(void)
|
2013-11-15 12:55:58 +08:00
|
|
|
{
|
2020-02-17 17:46:20 +08:00
|
|
|
discard_entry_slab = f2fs_kmem_cache_create("f2fs_discard_entry",
|
2014-03-07 18:43:28 +08:00
|
|
|
sizeof(struct discard_entry));
|
2013-11-15 12:55:58 +08:00
|
|
|
if (!discard_entry_slab)
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
goto fail;
|
|
|
|
|
2020-02-17 17:46:20 +08:00
|
|
|
discard_cmd_slab = f2fs_kmem_cache_create("f2fs_discard_cmd",
|
2017-01-10 06:13:03 +08:00
|
|
|
sizeof(struct discard_cmd));
|
|
|
|
if (!discard_cmd_slab)
|
2016-09-05 12:28:26 +08:00
|
|
|
goto destroy_discard_entry;
|
2016-08-29 23:58:34 +08:00
|
|
|
|
2020-02-17 17:46:20 +08:00
|
|
|
sit_entry_set_slab = f2fs_kmem_cache_create("f2fs_sit_entry_set",
|
2014-11-21 14:42:07 +08:00
|
|
|
sizeof(struct sit_entry_set));
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
if (!sit_entry_set_slab)
|
2017-01-10 06:13:03 +08:00
|
|
|
goto destroy_discard_cmd;
|
2014-10-07 08:39:50 +08:00
|
|
|
|
2022-04-29 02:18:09 +08:00
|
|
|
revoke_entry_slab = f2fs_kmem_cache_create("f2fs_revoke_entry",
|
|
|
|
sizeof(struct revoke_entry));
|
|
|
|
if (!revoke_entry_slab)
|
2014-10-07 08:39:50 +08:00
|
|
|
goto destroy_sit_entry_set;
|
2013-11-15 12:55:58 +08:00
|
|
|
return 0;
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
2014-10-07 08:39:50 +08:00
|
|
|
destroy_sit_entry_set:
|
|
|
|
kmem_cache_destroy(sit_entry_set_slab);
|
2017-01-10 06:13:03 +08:00
|
|
|
destroy_discard_cmd:
|
|
|
|
kmem_cache_destroy(discard_cmd_slab);
|
2016-09-05 12:28:26 +08:00
|
|
|
destroy_discard_entry:
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
kmem_cache_destroy(discard_entry_slab);
|
|
|
|
fail:
|
|
|
|
return -ENOMEM;
|
2013-11-15 12:55:58 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_destroy_segment_manager_caches(void)
|
2013-11-15 12:55:58 +08:00
|
|
|
{
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
kmem_cache_destroy(sit_entry_set_slab);
|
2017-01-10 06:13:03 +08:00
|
|
|
kmem_cache_destroy(discard_cmd_slab);
|
2013-11-15 12:55:58 +08:00
|
|
|
kmem_cache_destroy(discard_entry_slab);
|
2022-04-29 02:18:09 +08:00
|
|
|
kmem_cache_destroy(revoke_entry_slab);
|
2013-11-15 12:55:58 +08:00
|
|
|
}
|