2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* fs/f2fs/segment.c
|
|
|
|
*
|
|
|
|
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
|
|
|
|
* http://www.samsung.com/
|
|
|
|
*
|
|
|
|
* This program is free software; you can redistribute it and/or modify
|
|
|
|
* it under the terms of the GNU General Public License version 2 as
|
|
|
|
* published by the Free Software Foundation.
|
|
|
|
*/
|
|
|
|
#include <linux/fs.h>
|
|
|
|
#include <linux/f2fs_fs.h>
|
|
|
|
#include <linux/bio.h>
|
|
|
|
#include <linux/blkdev.h>
|
2012-12-20 05:19:30 +08:00
|
|
|
#include <linux/prefetch.h>
|
2014-04-02 14:34:36 +08:00
|
|
|
#include <linux/kthread.h>
|
2013-11-22 09:09:59 +08:00
|
|
|
#include <linux/swap.h>
|
2015-10-06 05:49:57 +08:00
|
|
|
#include <linux/timer.h>
|
2017-05-18 01:36:58 +08:00
|
|
|
#include <linux/freezer.h>
|
2017-09-10 03:03:23 +08:00
|
|
|
#include <linux/sched/signal.h>
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
#include "f2fs.h"
|
|
|
|
#include "segment.h"
|
|
|
|
#include "node.h"
|
2017-08-16 12:27:19 +08:00
|
|
|
#include "gc.h"
|
2014-12-18 11:58:58 +08:00
|
|
|
#include "trace.h"
|
2013-04-23 16:51:43 +08:00
|
|
|
#include <trace/events/f2fs.h>
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2013-11-15 09:42:51 +08:00
|
|
|
#define __reverse_ffz(x) __reverse_ffs(~(x))
|
|
|
|
|
2013-11-15 12:55:58 +08:00
|
|
|
static struct kmem_cache *discard_entry_slab;
|
2017-01-10 06:13:03 +08:00
|
|
|
static struct kmem_cache *discard_cmd_slab;
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
static struct kmem_cache *sit_entry_set_slab;
|
2014-10-07 08:39:50 +08:00
|
|
|
static struct kmem_cache *inmem_entry_slab;
|
2013-11-15 12:55:58 +08:00
|
|
|
|
2015-10-21 06:17:19 +08:00
|
|
|
static unsigned long __reverse_ulong(unsigned char *str)
|
|
|
|
{
|
|
|
|
unsigned long tmp = 0;
|
|
|
|
int shift = 24, idx = 0;
|
|
|
|
|
|
|
|
#if BITS_PER_LONG == 64
|
|
|
|
shift = 56;
|
|
|
|
#endif
|
|
|
|
while (shift >= 0) {
|
|
|
|
tmp |= (unsigned long)str[idx++] << shift;
|
|
|
|
shift -= BITS_PER_BYTE;
|
|
|
|
}
|
|
|
|
return tmp;
|
|
|
|
}
|
|
|
|
|
2013-11-15 09:42:51 +08:00
|
|
|
/*
|
|
|
|
* __reverse_ffs is copied from include/asm-generic/bitops/__ffs.h since
|
|
|
|
* MSB and LSB are reversed in a byte by f2fs_set_bit.
|
|
|
|
*/
|
|
|
|
static inline unsigned long __reverse_ffs(unsigned long word)
|
|
|
|
{
|
|
|
|
int num = 0;
|
|
|
|
|
|
|
|
#if BITS_PER_LONG == 64
|
2015-10-21 06:17:19 +08:00
|
|
|
if ((word & 0xffffffff00000000UL) == 0)
|
2013-11-15 09:42:51 +08:00
|
|
|
num += 32;
|
2015-10-21 06:17:19 +08:00
|
|
|
else
|
2013-11-15 09:42:51 +08:00
|
|
|
word >>= 32;
|
|
|
|
#endif
|
2015-10-21 06:17:19 +08:00
|
|
|
if ((word & 0xffff0000) == 0)
|
2013-11-15 09:42:51 +08:00
|
|
|
num += 16;
|
2015-10-21 06:17:19 +08:00
|
|
|
else
|
2013-11-15 09:42:51 +08:00
|
|
|
word >>= 16;
|
2015-10-21 06:17:19 +08:00
|
|
|
|
|
|
|
if ((word & 0xff00) == 0)
|
2013-11-15 09:42:51 +08:00
|
|
|
num += 8;
|
2015-10-21 06:17:19 +08:00
|
|
|
else
|
2013-11-15 09:42:51 +08:00
|
|
|
word >>= 8;
|
2015-10-21 06:17:19 +08:00
|
|
|
|
2013-11-15 09:42:51 +08:00
|
|
|
if ((word & 0xf0) == 0)
|
|
|
|
num += 4;
|
|
|
|
else
|
|
|
|
word >>= 4;
|
2015-10-21 06:17:19 +08:00
|
|
|
|
2013-11-15 09:42:51 +08:00
|
|
|
if ((word & 0xc) == 0)
|
|
|
|
num += 2;
|
|
|
|
else
|
|
|
|
word >>= 2;
|
2015-10-21 06:17:19 +08:00
|
|
|
|
2013-11-15 09:42:51 +08:00
|
|
|
if ((word & 0x2) == 0)
|
|
|
|
num += 1;
|
|
|
|
return num;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2014-08-06 22:22:50 +08:00
|
|
|
* __find_rev_next(_zero)_bit is copied from lib/find_next_bit.c because
|
2013-11-15 09:42:51 +08:00
|
|
|
* f2fs_set_bit makes MSB and LSB reversed in a byte.
|
2015-11-12 08:43:04 +08:00
|
|
|
* @size must be integral times of unsigned long.
|
2013-11-15 09:42:51 +08:00
|
|
|
* Example:
|
2015-10-21 06:17:19 +08:00
|
|
|
* MSB <--> LSB
|
|
|
|
* f2fs_set_bit(0, bitmap) => 1000 0000
|
|
|
|
* f2fs_set_bit(7, bitmap) => 0000 0001
|
2013-11-15 09:42:51 +08:00
|
|
|
*/
|
|
|
|
static unsigned long __find_rev_next_bit(const unsigned long *addr,
|
|
|
|
unsigned long size, unsigned long offset)
|
|
|
|
{
|
|
|
|
const unsigned long *p = addr + BIT_WORD(offset);
|
2015-11-12 08:43:04 +08:00
|
|
|
unsigned long result = size;
|
2013-11-15 09:42:51 +08:00
|
|
|
unsigned long tmp;
|
|
|
|
|
|
|
|
if (offset >= size)
|
|
|
|
return size;
|
|
|
|
|
2015-11-12 08:43:04 +08:00
|
|
|
size -= (offset & ~(BITS_PER_LONG - 1));
|
2013-11-15 09:42:51 +08:00
|
|
|
offset %= BITS_PER_LONG;
|
2015-10-21 06:17:19 +08:00
|
|
|
|
2015-11-12 08:43:04 +08:00
|
|
|
while (1) {
|
|
|
|
if (*p == 0)
|
|
|
|
goto pass;
|
2013-11-15 09:42:51 +08:00
|
|
|
|
2015-10-21 06:17:19 +08:00
|
|
|
tmp = __reverse_ulong((unsigned char *)p);
|
2015-11-12 08:43:04 +08:00
|
|
|
|
|
|
|
tmp &= ~0UL >> offset;
|
|
|
|
if (size < BITS_PER_LONG)
|
|
|
|
tmp &= (~0UL << (BITS_PER_LONG - size));
|
2013-11-15 09:42:51 +08:00
|
|
|
if (tmp)
|
2015-11-12 08:43:04 +08:00
|
|
|
goto found;
|
|
|
|
pass:
|
|
|
|
if (size <= BITS_PER_LONG)
|
|
|
|
break;
|
2013-11-15 09:42:51 +08:00
|
|
|
size -= BITS_PER_LONG;
|
2015-11-12 08:43:04 +08:00
|
|
|
offset = 0;
|
2015-10-21 06:17:19 +08:00
|
|
|
p++;
|
2013-11-15 09:42:51 +08:00
|
|
|
}
|
2015-11-12 08:43:04 +08:00
|
|
|
return result;
|
|
|
|
found:
|
|
|
|
return result - size + __reverse_ffs(tmp);
|
2013-11-15 09:42:51 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static unsigned long __find_rev_next_zero_bit(const unsigned long *addr,
|
|
|
|
unsigned long size, unsigned long offset)
|
|
|
|
{
|
|
|
|
const unsigned long *p = addr + BIT_WORD(offset);
|
2015-12-05 08:51:13 +08:00
|
|
|
unsigned long result = size;
|
2013-11-15 09:42:51 +08:00
|
|
|
unsigned long tmp;
|
|
|
|
|
|
|
|
if (offset >= size)
|
|
|
|
return size;
|
|
|
|
|
2015-12-05 08:51:13 +08:00
|
|
|
size -= (offset & ~(BITS_PER_LONG - 1));
|
2013-11-15 09:42:51 +08:00
|
|
|
offset %= BITS_PER_LONG;
|
2015-12-05 08:51:13 +08:00
|
|
|
|
|
|
|
while (1) {
|
|
|
|
if (*p == ~0UL)
|
|
|
|
goto pass;
|
|
|
|
|
2015-10-21 06:17:19 +08:00
|
|
|
tmp = __reverse_ulong((unsigned char *)p);
|
2015-12-05 08:51:13 +08:00
|
|
|
|
|
|
|
if (offset)
|
|
|
|
tmp |= ~0UL << (BITS_PER_LONG - offset);
|
|
|
|
if (size < BITS_PER_LONG)
|
|
|
|
tmp |= ~0UL >> size;
|
2015-10-21 06:17:19 +08:00
|
|
|
if (tmp != ~0UL)
|
2015-12-05 08:51:13 +08:00
|
|
|
goto found;
|
|
|
|
pass:
|
|
|
|
if (size <= BITS_PER_LONG)
|
|
|
|
break;
|
2013-11-15 09:42:51 +08:00
|
|
|
size -= BITS_PER_LONG;
|
2015-12-05 08:51:13 +08:00
|
|
|
offset = 0;
|
2015-10-21 06:17:19 +08:00
|
|
|
p++;
|
2013-11-15 09:42:51 +08:00
|
|
|
}
|
2015-12-05 08:51:13 +08:00
|
|
|
return result;
|
|
|
|
found:
|
|
|
|
return result - size + __reverse_ffz(tmp);
|
2013-11-15 09:42:51 +08:00
|
|
|
}
|
|
|
|
|
2017-09-10 02:11:04 +08:00
|
|
|
bool need_SSR(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
|
|
|
|
int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
|
|
|
|
int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
|
|
|
|
|
|
|
|
if (test_opt(sbi, LFS))
|
|
|
|
return false;
|
|
|
|
if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs +
|
2017-10-28 16:52:33 +08:00
|
|
|
SM_I(sbi)->min_ssr_sections + reserved_sections(sbi));
|
2017-09-10 02:11:04 +08:00
|
|
|
}
|
|
|
|
|
2014-10-07 08:39:50 +08:00
|
|
|
void register_inmem_page(struct inode *inode, struct page *page)
|
|
|
|
{
|
2017-10-19 10:05:57 +08:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
2014-10-07 08:39:50 +08:00
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
|
|
|
struct inmem_pages *new;
|
2014-12-06 02:39:49 +08:00
|
|
|
|
2014-12-18 11:58:58 +08:00
|
|
|
f2fs_trace_pid(page);
|
2014-12-06 03:58:02 +08:00
|
|
|
|
2015-08-07 18:42:09 +08:00
|
|
|
set_page_private(page, (unsigned long)ATOMIC_WRITTEN_PAGE);
|
|
|
|
SetPagePrivate(page);
|
|
|
|
|
2014-10-07 08:39:50 +08:00
|
|
|
new = f2fs_kmem_cache_alloc(inmem_entry_slab, GFP_NOFS);
|
|
|
|
|
|
|
|
/* add atomic page indices to the list */
|
|
|
|
new->page = page;
|
|
|
|
INIT_LIST_HEAD(&new->list);
|
2015-08-07 18:42:09 +08:00
|
|
|
|
2014-10-07 08:39:50 +08:00
|
|
|
/* increase reference count with clean state */
|
|
|
|
mutex_lock(&fi->inmem_lock);
|
|
|
|
get_page(page);
|
|
|
|
list_add_tail(&new->list, &fi->inmem_pages);
|
2017-10-19 10:05:57 +08:00
|
|
|
spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
|
|
|
|
if (list_empty(&fi->inmem_ilist))
|
|
|
|
list_add_tail(&fi->inmem_ilist, &sbi->inode_list[ATOMIC_FILE]);
|
|
|
|
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
|
2014-12-06 09:18:15 +08:00
|
|
|
inc_page_count(F2FS_I_SB(inode), F2FS_INMEM_PAGES);
|
2014-10-07 08:39:50 +08:00
|
|
|
mutex_unlock(&fi->inmem_lock);
|
2015-03-18 08:58:08 +08:00
|
|
|
|
|
|
|
trace_f2fs_register_inmem_page(page, INMEM);
|
2014-10-07 08:39:50 +08:00
|
|
|
}
|
|
|
|
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
static int __revoke_inmem_pages(struct inode *inode,
|
|
|
|
struct list_head *head, bool drop, bool recover)
|
2016-02-06 14:38:29 +08:00
|
|
|
{
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
2016-02-06 14:38:29 +08:00
|
|
|
struct inmem_pages *cur, *tmp;
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
int err = 0;
|
2016-02-06 14:38:29 +08:00
|
|
|
|
|
|
|
list_for_each_entry_safe(cur, tmp, head, list) {
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
struct page *page = cur->page;
|
|
|
|
|
|
|
|
if (drop)
|
|
|
|
trace_f2fs_commit_inmem_page(page, INMEM_DROP);
|
|
|
|
|
|
|
|
lock_page(page);
|
2016-02-06 14:38:29 +08:00
|
|
|
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
if (recover) {
|
|
|
|
struct dnode_of_data dn;
|
|
|
|
struct node_info ni;
|
|
|
|
|
|
|
|
trace_f2fs_commit_inmem_page(page, INMEM_REVOKE);
|
2017-08-08 19:09:08 +08:00
|
|
|
retry:
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
set_new_dnode(&dn, inode, NULL, NULL, 0);
|
2017-08-08 19:09:08 +08:00
|
|
|
err = get_dnode_of_data(&dn, page->index, LOOKUP_NODE);
|
|
|
|
if (err) {
|
|
|
|
if (err == -ENOMEM) {
|
|
|
|
congestion_wait(BLK_RW_ASYNC, HZ/50);
|
|
|
|
cond_resched();
|
|
|
|
goto retry;
|
|
|
|
}
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
err = -EAGAIN;
|
|
|
|
goto next;
|
|
|
|
}
|
|
|
|
get_node_info(sbi, dn.nid, &ni);
|
2018-01-10 15:49:10 +08:00
|
|
|
if (cur->old_addr == NEW_ADDR) {
|
|
|
|
invalidate_blocks(sbi, dn.data_blkaddr);
|
|
|
|
f2fs_update_data_blkaddr(&dn, NEW_ADDR);
|
|
|
|
} else
|
|
|
|
f2fs_replace_block(sbi, &dn, dn.data_blkaddr,
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
cur->old_addr, ni.version, true, true);
|
|
|
|
f2fs_put_dnode(&dn);
|
|
|
|
}
|
|
|
|
next:
|
2016-04-13 05:11:03 +08:00
|
|
|
/* we don't need to invalidate this in the sccessful status */
|
|
|
|
if (drop || recover)
|
|
|
|
ClearPageUptodate(page);
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
set_page_private(page, 0);
|
2016-04-29 20:13:36 +08:00
|
|
|
ClearPagePrivate(page);
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
f2fs_put_page(page, 1);
|
2016-02-06 14:38:29 +08:00
|
|
|
|
|
|
|
list_del(&cur->list);
|
|
|
|
kmem_cache_free(inmem_entry_slab, cur);
|
|
|
|
dec_page_count(F2FS_I_SB(inode), F2FS_INMEM_PAGES);
|
|
|
|
}
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
return err;
|
2016-02-06 14:38:29 +08:00
|
|
|
}
|
|
|
|
|
2017-10-19 10:05:57 +08:00
|
|
|
void drop_inmem_pages_all(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct list_head *head = &sbi->inode_list[ATOMIC_FILE];
|
|
|
|
struct inode *inode;
|
|
|
|
struct f2fs_inode_info *fi;
|
|
|
|
next:
|
|
|
|
spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
|
|
|
|
if (list_empty(head)) {
|
|
|
|
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
fi = list_first_entry(head, struct f2fs_inode_info, inmem_ilist);
|
|
|
|
inode = igrab(&fi->vfs_inode);
|
|
|
|
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
|
|
|
|
|
|
|
|
if (inode) {
|
|
|
|
drop_inmem_pages(inode);
|
|
|
|
iput(inode);
|
|
|
|
}
|
|
|
|
congestion_wait(BLK_RW_ASYNC, HZ/50);
|
|
|
|
cond_resched();
|
|
|
|
goto next;
|
|
|
|
}
|
|
|
|
|
2016-02-06 14:38:29 +08:00
|
|
|
void drop_inmem_pages(struct inode *inode)
|
|
|
|
{
|
2017-10-19 10:05:57 +08:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
2016-02-06 14:38:29 +08:00
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
|
|
|
|
|
|
|
mutex_lock(&fi->inmem_lock);
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
__revoke_inmem_pages(inode, &fi->inmem_pages, true, false);
|
2017-10-19 10:05:57 +08:00
|
|
|
spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
|
|
|
|
if (!list_empty(&fi->inmem_ilist))
|
|
|
|
list_del_init(&fi->inmem_ilist);
|
|
|
|
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
|
2016-02-06 14:38:29 +08:00
|
|
|
mutex_unlock(&fi->inmem_lock);
|
2017-01-07 18:50:26 +08:00
|
|
|
|
|
|
|
clear_inode_flag(inode, FI_ATOMIC_FILE);
|
2017-08-18 23:37:36 +08:00
|
|
|
clear_inode_flag(inode, FI_HOT_DATA);
|
2017-01-07 18:50:26 +08:00
|
|
|
stat_dec_atomic_write(inode);
|
2016-02-06 14:38:29 +08:00
|
|
|
}
|
|
|
|
|
2017-03-17 09:55:52 +08:00
|
|
|
void drop_inmem_page(struct inode *inode, struct page *page)
|
|
|
|
{
|
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
|
|
|
struct list_head *head = &fi->inmem_pages;
|
|
|
|
struct inmem_pages *cur = NULL;
|
|
|
|
|
|
|
|
f2fs_bug_on(sbi, !IS_ATOMIC_WRITTEN_PAGE(page));
|
|
|
|
|
|
|
|
mutex_lock(&fi->inmem_lock);
|
|
|
|
list_for_each_entry(cur, head, list) {
|
|
|
|
if (cur->page == page)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
f2fs_bug_on(sbi, !cur || cur->page != page);
|
|
|
|
list_del(&cur->list);
|
|
|
|
mutex_unlock(&fi->inmem_lock);
|
|
|
|
|
|
|
|
dec_page_count(sbi, F2FS_INMEM_PAGES);
|
|
|
|
kmem_cache_free(inmem_entry_slab, cur);
|
|
|
|
|
|
|
|
ClearPageUptodate(page);
|
|
|
|
set_page_private(page, 0);
|
|
|
|
ClearPagePrivate(page);
|
|
|
|
f2fs_put_page(page, 0);
|
|
|
|
|
|
|
|
trace_f2fs_commit_inmem_page(page, INMEM_INVALIDATE);
|
|
|
|
}
|
|
|
|
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
static int __commit_inmem_pages(struct inode *inode,
|
|
|
|
struct list_head *revoke_list)
|
2014-10-07 08:39:50 +08:00
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
|
|
|
struct inmem_pages *cur, *tmp;
|
|
|
|
struct f2fs_io_info fio = {
|
2015-04-24 05:38:15 +08:00
|
|
|
.sbi = sbi,
|
2017-09-29 13:59:38 +08:00
|
|
|
.ino = inode->i_ino,
|
2014-10-07 08:39:50 +08:00
|
|
|
.type = DATA,
|
2016-06-06 03:31:55 +08:00
|
|
|
.op = REQ_OP_WRITE,
|
2016-11-01 21:40:10 +08:00
|
|
|
.op_flags = REQ_SYNC | REQ_PRIO,
|
2017-08-02 23:21:48 +08:00
|
|
|
.io_type = FS_DATA_IO,
|
2014-10-07 08:39:50 +08:00
|
|
|
};
|
2017-02-02 08:51:22 +08:00
|
|
|
pgoff_t last_idx = ULONG_MAX;
|
2015-07-25 15:52:52 +08:00
|
|
|
int err = 0;
|
2014-10-07 08:39:50 +08:00
|
|
|
|
|
|
|
list_for_each_entry_safe(cur, tmp, &fi->inmem_pages, list) {
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
struct page *page = cur->page;
|
|
|
|
|
|
|
|
lock_page(page);
|
|
|
|
if (page->mapping == inode->i_mapping) {
|
|
|
|
trace_f2fs_commit_inmem_page(page, INMEM);
|
|
|
|
|
|
|
|
set_page_dirty(page);
|
|
|
|
f2fs_wait_on_page_writeback(page, DATA, true);
|
2016-10-11 22:57:01 +08:00
|
|
|
if (clear_page_dirty_for_io(page)) {
|
2016-02-06 14:38:29 +08:00
|
|
|
inode_dec_dirty_pages(inode);
|
2016-10-11 22:57:01 +08:00
|
|
|
remove_dirty_inode(inode);
|
|
|
|
}
|
2017-07-20 01:59:55 +08:00
|
|
|
retry:
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
fio.page = page;
|
2017-04-25 20:45:13 +08:00
|
|
|
fio.old_blkaddr = NULL_ADDR;
|
2017-04-27 02:11:12 +08:00
|
|
|
fio.encrypted_page = NULL;
|
2017-05-13 04:51:34 +08:00
|
|
|
fio.need_lock = LOCK_DONE;
|
2016-02-06 14:38:29 +08:00
|
|
|
err = do_write_data_page(&fio);
|
|
|
|
if (err) {
|
2017-07-20 01:59:55 +08:00
|
|
|
if (err == -ENOMEM) {
|
|
|
|
congestion_wait(BLK_RW_ASYNC, HZ/50);
|
|
|
|
cond_resched();
|
|
|
|
goto retry;
|
|
|
|
}
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
unlock_page(page);
|
2016-02-06 14:38:29 +08:00
|
|
|
break;
|
2014-12-11 05:59:33 +08:00
|
|
|
}
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
/* record old blkaddr for revoking */
|
|
|
|
cur->old_addr = fio.old_blkaddr;
|
2017-02-02 08:51:22 +08:00
|
|
|
last_idx = page->index;
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
}
|
|
|
|
unlock_page(page);
|
|
|
|
list_move_tail(&cur->list, revoke_list);
|
2014-10-07 08:39:50 +08:00
|
|
|
}
|
2016-02-06 14:38:29 +08:00
|
|
|
|
2017-02-02 08:51:22 +08:00
|
|
|
if (last_idx != ULONG_MAX)
|
2017-05-11 02:28:38 +08:00
|
|
|
f2fs_submit_merged_write_cond(sbi, inode, 0, last_idx, DATA);
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
|
|
|
|
if (!err)
|
|
|
|
__revoke_inmem_pages(inode, revoke_list, false, false);
|
|
|
|
|
2016-02-06 14:38:29 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
int commit_inmem_pages(struct inode *inode)
|
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
struct list_head revoke_list;
|
|
|
|
int err;
|
2016-02-06 14:38:29 +08:00
|
|
|
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
INIT_LIST_HEAD(&revoke_list);
|
2016-02-06 14:38:29 +08:00
|
|
|
f2fs_balance_fs(sbi, true);
|
|
|
|
f2fs_lock_op(sbi);
|
|
|
|
|
2017-01-07 18:50:26 +08:00
|
|
|
set_inode_flag(inode, FI_ATOMIC_COMMIT);
|
|
|
|
|
2016-02-06 14:38:29 +08:00
|
|
|
mutex_lock(&fi->inmem_lock);
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
err = __commit_inmem_pages(inode, &revoke_list);
|
|
|
|
if (err) {
|
|
|
|
int ret;
|
|
|
|
/*
|
|
|
|
* try to revoke all committed pages, but still we could fail
|
|
|
|
* due to no memory or other reason, if that happened, EAGAIN
|
|
|
|
* will be returned, which means in such case, transaction is
|
|
|
|
* already not integrity, caller should use journal to do the
|
|
|
|
* recovery or rewrite & commit last transaction. For other
|
|
|
|
* error number, revoking was done by filesystem itself.
|
|
|
|
*/
|
|
|
|
ret = __revoke_inmem_pages(inode, &revoke_list, false, true);
|
|
|
|
if (ret)
|
|
|
|
err = ret;
|
|
|
|
|
|
|
|
/* drop all uncommitted pages */
|
|
|
|
__revoke_inmem_pages(inode, &fi->inmem_pages, true, false);
|
|
|
|
}
|
2017-10-19 10:05:57 +08:00
|
|
|
spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
|
|
|
|
if (!list_empty(&fi->inmem_ilist))
|
|
|
|
list_del_init(&fi->inmem_ilist);
|
|
|
|
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
|
2014-10-07 08:39:50 +08:00
|
|
|
mutex_unlock(&fi->inmem_lock);
|
|
|
|
|
2017-01-07 18:50:26 +08:00
|
|
|
clear_inode_flag(inode, FI_ATOMIC_COMMIT);
|
|
|
|
|
2016-02-06 14:38:29 +08:00
|
|
|
f2fs_unlock_op(sbi);
|
2015-07-25 15:52:52 +08:00
|
|
|
return err;
|
2014-10-07 08:39:50 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* This function balances dirty node and dentry pages.
|
|
|
|
* In addition, it controls garbage collection.
|
|
|
|
*/
|
2016-01-08 06:15:04 +08:00
|
|
|
void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2016-09-26 19:45:55 +08:00
|
|
|
#ifdef CONFIG_F2FS_FAULT_INJECTION
|
2017-02-25 11:08:28 +08:00
|
|
|
if (time_to_inject(sbi, FAULT_CHECKPOINT)) {
|
|
|
|
f2fs_show_injection_info(FAULT_CHECKPOINT);
|
2016-09-26 19:45:55 +08:00
|
|
|
f2fs_stop_checkpoint(sbi, false);
|
2017-02-25 11:08:28 +08:00
|
|
|
}
|
2016-09-26 19:45:55 +08:00
|
|
|
#endif
|
|
|
|
|
2016-06-03 06:24:24 +08:00
|
|
|
/* balance_fs_bg is able to be pending */
|
2017-04-21 04:51:57 +08:00
|
|
|
if (need && excess_cached_nats(sbi))
|
2016-06-03 06:24:24 +08:00
|
|
|
f2fs_balance_fs_bg(sbi);
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/*
|
2012-12-21 16:20:21 +08:00
|
|
|
* We should do GC or end up with checkpoint, if there are so many dirty
|
|
|
|
* dir/node pages without enough free segments.
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
*/
|
2016-09-02 03:02:51 +08:00
|
|
|
if (has_not_enough_free_secs(sbi, 0, 0)) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
mutex_lock(&sbi->gc_mutex);
|
2017-04-14 06:17:00 +08:00
|
|
|
f2fs_gc(sbi, false, false, NULL_SEGNO);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-10-24 13:19:18 +08:00
|
|
|
void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
f2fs: enable rb-tree extent cache
This patch enables rb-tree based extent cache in f2fs.
When we mount with "-o extent_cache", f2fs will try to add recently accessed
page-block mappings into rb-tree based extent cache as much as possible, instead
of original one extent info cache.
By this way, f2fs can support more effective cache between dnode page cache and
disk. It will supply high hit ratio in the cache with fewer memory when dnode
page cache are reclaimed in environment of low memory.
Storage: Sandisk sd card 64g
1.append write file (offset: 0, size: 128M);
2.override write file (offset: 2M, size: 1M);
3.override write file (offset: 4M, size: 1M);
...
4.override write file (offset: 48M, size: 1M);
...
5.override write file (offset: 112M, size: 1M);
6.sync
7.echo 3 > /proc/sys/vm/drop_caches
8.read file (size:128M, unit: 4k, count: 32768)
(time dd if=/mnt/f2fs/128m bs=4k count=32768)
Extent Hit Ratio:
before patched
Hit Ratio 121 / 1071 1071 / 1071
Performance:
before patched
real 0m37.051s 0m35.556s
user 0m0.040s 0m0.026s
sys 0m2.990s 0m2.251s
Memory Cost:
before patched
Tree Count: 0 1 (size: 24 bytes)
Node Count: 0 45 (size: 1440 bytes)
v3:
o retest and given more details of test result.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-05 17:57:31 +08:00
|
|
|
/* try to shrink extent cache when there is no enough memory */
|
2015-06-20 04:41:23 +08:00
|
|
|
if (!available_free_memory(sbi, EXTENT_CACHE))
|
|
|
|
f2fs_shrink_extent_tree(sbi, EXTENT_CACHE_SHRINK_NUMBER);
|
f2fs: enable rb-tree extent cache
This patch enables rb-tree based extent cache in f2fs.
When we mount with "-o extent_cache", f2fs will try to add recently accessed
page-block mappings into rb-tree based extent cache as much as possible, instead
of original one extent info cache.
By this way, f2fs can support more effective cache between dnode page cache and
disk. It will supply high hit ratio in the cache with fewer memory when dnode
page cache are reclaimed in environment of low memory.
Storage: Sandisk sd card 64g
1.append write file (offset: 0, size: 128M);
2.override write file (offset: 2M, size: 1M);
3.override write file (offset: 4M, size: 1M);
...
4.override write file (offset: 48M, size: 1M);
...
5.override write file (offset: 112M, size: 1M);
6.sync
7.echo 3 > /proc/sys/vm/drop_caches
8.read file (size:128M, unit: 4k, count: 32768)
(time dd if=/mnt/f2fs/128m bs=4k count=32768)
Extent Hit Ratio:
before patched
Hit Ratio 121 / 1071 1071 / 1071
Performance:
before patched
real 0m37.051s 0m35.556s
user 0m0.040s 0m0.026s
sys 0m2.990s 0m2.251s
Memory Cost:
before patched
Tree Count: 0 1 (size: 24 bytes)
Node Count: 0 45 (size: 1440 bytes)
v3:
o retest and given more details of test result.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-05 17:57:31 +08:00
|
|
|
|
2015-06-20 06:36:07 +08:00
|
|
|
/* check the # of cached NAT entries */
|
|
|
|
if (!available_free_memory(sbi, NAT_ENTRIES))
|
|
|
|
try_to_free_nats(sbi, NAT_ENTRY_PER_BLOCK);
|
|
|
|
|
2015-07-28 18:33:46 +08:00
|
|
|
if (!available_free_memory(sbi, FREE_NIDS))
|
2016-06-17 07:41:49 +08:00
|
|
|
try_to_free_nids(sbi, MAX_FREE_NIDS);
|
|
|
|
else
|
2017-02-10 02:38:09 +08:00
|
|
|
build_free_nids(sbi, false, false);
|
2015-07-28 18:33:46 +08:00
|
|
|
|
2017-05-02 09:09:44 +08:00
|
|
|
if (!is_idle(sbi) && !excess_dirty_nats(sbi))
|
2016-12-06 03:37:14 +08:00
|
|
|
return;
|
2015-07-28 18:33:46 +08:00
|
|
|
|
2015-06-20 06:36:07 +08:00
|
|
|
/* checkpoint is the only way to shrink partial cached entries */
|
|
|
|
if (!available_free_memory(sbi, NAT_ENTRIES) ||
|
2015-10-06 05:49:57 +08:00
|
|
|
!available_free_memory(sbi, INO_ENTRIES) ||
|
f2fs: flush dirty nat entries when exceeding threshold
When testing f2fs with xfstest, generic/251 is stuck for long time,
the case uses below serials to obtain fresh released space in device,
in order to prepare for following fstrim test.
1. rm -rf /mnt/dir
2. mkdir /mnt/dir/
3. cp -axT `pwd`/ /mnt/dir/
4. goto 1
During preparing step, all nat entries will be cached in nat cache,
most of them are dirty entries with invalid blkaddr, which means
nodes related to these entries have been truncated, and they could
be reused after the dirty entries been checkpointed.
However, there was no checkpoint been triggered, so nid allocators
(e.g. mkdir, creat) will run into long journey of iterating all NAT
pages, looking for free nids in alloc_nid->build_free_nids.
Here, in f2fs_balance_fs_bg we give another chance to do checkpoint
to flush nat entries for reusing them in free nid cache when dirty
entry count exceeds 10% of max count.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-18 18:31:18 +08:00
|
|
|
excess_prefree_segs(sbi) ||
|
|
|
|
excess_dirty_nats(sbi) ||
|
2016-12-06 03:37:14 +08:00
|
|
|
f2fs_time_over(sbi, CP_TIME)) {
|
2016-02-14 18:54:33 +08:00
|
|
|
if (test_opt(sbi, DATA_FLUSH)) {
|
|
|
|
struct blk_plug plug;
|
|
|
|
|
|
|
|
blk_start_plug(&plug);
|
f2fs: support data flush in background
Previously, when finishing a checkpoint, we have persisted all fs meta
info including meta inode, node inode, dentry page of directory inode, so,
after a sudden power cut, f2fs can recover from last checkpoint with full
directory structure.
But during checkpoint, we didn't flush dirty pages of regular and symlink
inode, so such dirty datas still in memory will be lost in that moment of
power off.
In order to reduce the chance of lost data, this patch enables
f2fs_balance_fs_bg with the ability of data flushing. It will try to flush
user data before starting a checkpoint. So user's data written after last
checkpoint which may not be fsynced could be saved.
When we mount with data_flush option, after every period of cp_interval
(could be configured in sysfs: /sys/fs/f2fs/device/cp_interval) seconds
user data could be flushed into device once f2fs_balance_fs_bg was called
in kworker thread or gc thread.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-17 17:13:28 +08:00
|
|
|
sync_dirty_inodes(sbi, FILE_INODE);
|
2016-02-14 18:54:33 +08:00
|
|
|
blk_finish_plug(&plug);
|
|
|
|
}
|
2013-10-24 13:19:18 +08:00
|
|
|
f2fs_sync_fs(sbi->sb, true);
|
2016-01-10 05:45:17 +08:00
|
|
|
stat_inc_bg_cp_count(sbi->stat_info);
|
f2fs: support data flush in background
Previously, when finishing a checkpoint, we have persisted all fs meta
info including meta inode, node inode, dentry page of directory inode, so,
after a sudden power cut, f2fs can recover from last checkpoint with full
directory structure.
But during checkpoint, we didn't flush dirty pages of regular and symlink
inode, so such dirty datas still in memory will be lost in that moment of
power off.
In order to reduce the chance of lost data, this patch enables
f2fs_balance_fs_bg with the ability of data flushing. It will try to flush
user data before starting a checkpoint. So user's data written after last
checkpoint which may not be fsynced could be saved.
When we mount with data_flush option, after every period of cp_interval
(could be configured in sysfs: /sys/fs/f2fs/device/cp_interval) seconds
user data could be flushed into device once f2fs_balance_fs_bg was called
in kworker thread or gc thread.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-17 17:13:28 +08:00
|
|
|
}
|
2013-10-24 13:19:18 +08:00
|
|
|
}
|
|
|
|
|
2017-03-04 22:13:10 +08:00
|
|
|
static int __submit_flush_wait(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev)
|
2016-10-07 10:02:05 +08:00
|
|
|
{
|
2017-10-28 16:52:31 +08:00
|
|
|
struct bio *bio = f2fs_bio_alloc(sbi, 0, true);
|
2016-10-07 10:02:05 +08:00
|
|
|
int ret;
|
|
|
|
|
2017-05-02 23:03:47 +08:00
|
|
|
bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
|
2017-08-24 01:10:32 +08:00
|
|
|
bio_set_dev(bio, bdev);
|
2016-10-07 10:02:05 +08:00
|
|
|
ret = submit_bio_wait(bio);
|
|
|
|
bio_put(bio);
|
2017-03-04 22:13:10 +08:00
|
|
|
|
|
|
|
trace_f2fs_issue_flush(bdev, test_opt(sbi, NOBARRIER),
|
|
|
|
test_opt(sbi, FLUSH_MERGE), ret);
|
2016-10-07 10:02:05 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
static int submit_flush_wait(struct f2fs_sb_info *sbi, nid_t ino)
|
2016-10-07 10:02:05 +08:00
|
|
|
{
|
2017-09-29 13:59:38 +08:00
|
|
|
int ret = 0;
|
2016-10-07 10:02:05 +08:00
|
|
|
int i;
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
if (!sbi->s_ndevs)
|
|
|
|
return __submit_flush_wait(sbi, sbi->sb->s_bdev);
|
2017-03-04 22:13:10 +08:00
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
for (i = 0; i < sbi->s_ndevs; i++) {
|
|
|
|
if (!is_dirty_device(sbi, ino, i, FLUSH_INO))
|
|
|
|
continue;
|
2017-03-04 22:13:10 +08:00
|
|
|
ret = __submit_flush_wait(sbi, FDEV(i).bdev);
|
|
|
|
if (ret)
|
|
|
|
break;
|
2016-10-07 10:02:05 +08:00
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-04-27 14:21:33 +08:00
|
|
|
static int issue_flush_thread(void *data)
|
2014-04-02 14:34:36 +08:00
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = data;
|
2017-01-10 06:13:03 +08:00
|
|
|
struct flush_cmd_control *fcc = SM_I(sbi)->fcc_info;
|
2014-04-27 14:21:21 +08:00
|
|
|
wait_queue_head_t *q = &fcc->flush_wait_queue;
|
2014-04-02 14:34:36 +08:00
|
|
|
repeat:
|
|
|
|
if (kthread_should_stop())
|
|
|
|
return 0;
|
|
|
|
|
f2fs: make background threads of f2fs being aware of freezing
When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.
This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread
Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
generic/241 reports below bug:
======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc1+ #32 Tainted: G O
------------------------------------------------------
f2fs_gc-250:0/22186 is trying to acquire lock:
(&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
but task is already holding lock:
(sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){++++.-}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__sb_start_write+0x11d/0x1f0
f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
evict+0xa8/0x170
iput+0x1fb/0x2c0
f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
write_checkpoint+0x1b1/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
f2fs_sync_file+0x34/0x40 [f2fs]
vfs_fsync_range+0x4a/0xa0
do_fsync+0x3c/0x60
SyS_fdatasync+0x15/0x20
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #1 (&sbi->cp_mutex){+.+...}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
write_checkpoint+0x2f/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x92
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #0 (&sbi->gc_mutex){+.+...}:
validate_chain.isra.36+0xc50/0xdb0
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
kthread+0xe9/0x120
ret_from_fork+0x19/0x24
other info that might help us debug this:
Chain exists of:
&sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal#2);
lock(&sbi->cp_mutex);
lock(sb_internal#2);
lock(&sbi->gc_mutex);
*** DEADLOCK ***
1 lock held by f2fs_gc-250:0/22186:
#0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
stack backtrace:
CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_circular_bug+0x1b3/0x1bd
validate_chain.isra.36+0xc50/0xdb0
? __this_cpu_preempt_check+0xf/0x20
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
__mutex_lock+0x4f/0x830
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
mutex_lock_nested+0x25/0x30
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
? preempt_schedule_common+0x2f/0x4d
? f2fs_gc+0x540/0x540 [f2fs]
kthread+0xe9/0x120
? f2fs_gc+0x540/0x540 [f2fs]
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x19/0x24
The deadlock occurs in below condition:
GC Thread Thread B
- sb_start_intwrite
- f2fs_sync_file
- f2fs_sync_fs
- mutex_lock(&sbi->gc_mutex)
- write_checkpoint
- block_operations
- f2fs_sync_inode_meta
- iput
- sb_start_intwrite
- mutex_lock(&sbi->gc_mutex)
Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-22 08:52:23 +08:00
|
|
|
sb_start_intwrite(sbi->sb);
|
|
|
|
|
2014-09-05 18:31:00 +08:00
|
|
|
if (!llist_empty(&fcc->issue_list)) {
|
2014-04-02 14:34:36 +08:00
|
|
|
struct flush_cmd *cmd, *next;
|
|
|
|
int ret;
|
|
|
|
|
2014-09-05 18:31:00 +08:00
|
|
|
fcc->dispatch_list = llist_del_all(&fcc->issue_list);
|
|
|
|
fcc->dispatch_list = llist_reverse_order(fcc->dispatch_list);
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
cmd = llist_entry(fcc->dispatch_list, struct flush_cmd, llnode);
|
|
|
|
|
|
|
|
ret = submit_flush_wait(sbi, cmd->ino);
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_inc(&fcc->issued_flush);
|
|
|
|
|
2014-09-05 18:31:00 +08:00
|
|
|
llist_for_each_entry_safe(cmd, next,
|
|
|
|
fcc->dispatch_list, llnode) {
|
2014-04-02 14:34:36 +08:00
|
|
|
cmd->ret = ret;
|
|
|
|
complete(&cmd->wait);
|
|
|
|
}
|
2014-04-27 14:21:21 +08:00
|
|
|
fcc->dispatch_list = NULL;
|
2014-04-02 14:34:36 +08:00
|
|
|
}
|
|
|
|
|
f2fs: make background threads of f2fs being aware of freezing
When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.
This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread
Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
generic/241 reports below bug:
======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc1+ #32 Tainted: G O
------------------------------------------------------
f2fs_gc-250:0/22186 is trying to acquire lock:
(&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
but task is already holding lock:
(sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){++++.-}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__sb_start_write+0x11d/0x1f0
f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
evict+0xa8/0x170
iput+0x1fb/0x2c0
f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
write_checkpoint+0x1b1/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
f2fs_sync_file+0x34/0x40 [f2fs]
vfs_fsync_range+0x4a/0xa0
do_fsync+0x3c/0x60
SyS_fdatasync+0x15/0x20
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #1 (&sbi->cp_mutex){+.+...}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
write_checkpoint+0x2f/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x92
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #0 (&sbi->gc_mutex){+.+...}:
validate_chain.isra.36+0xc50/0xdb0
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
kthread+0xe9/0x120
ret_from_fork+0x19/0x24
other info that might help us debug this:
Chain exists of:
&sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal#2);
lock(&sbi->cp_mutex);
lock(sb_internal#2);
lock(&sbi->gc_mutex);
*** DEADLOCK ***
1 lock held by f2fs_gc-250:0/22186:
#0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
stack backtrace:
CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_circular_bug+0x1b3/0x1bd
validate_chain.isra.36+0xc50/0xdb0
? __this_cpu_preempt_check+0xf/0x20
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
__mutex_lock+0x4f/0x830
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
mutex_lock_nested+0x25/0x30
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
? preempt_schedule_common+0x2f/0x4d
? f2fs_gc+0x540/0x540 [f2fs]
kthread+0xe9/0x120
? f2fs_gc+0x540/0x540 [f2fs]
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x19/0x24
The deadlock occurs in below condition:
GC Thread Thread B
- sb_start_intwrite
- f2fs_sync_file
- f2fs_sync_fs
- mutex_lock(&sbi->gc_mutex)
- write_checkpoint
- block_operations
- f2fs_sync_inode_meta
- iput
- sb_start_intwrite
- mutex_lock(&sbi->gc_mutex)
Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-22 08:52:23 +08:00
|
|
|
sb_end_intwrite(sbi->sb);
|
|
|
|
|
2014-04-27 14:21:21 +08:00
|
|
|
wait_event_interruptible(*q,
|
2014-09-05 18:31:00 +08:00
|
|
|
kthread_should_stop() || !llist_empty(&fcc->issue_list));
|
2014-04-02 14:34:36 +08:00
|
|
|
goto repeat;
|
|
|
|
}
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino)
|
2014-04-02 14:34:36 +08:00
|
|
|
{
|
2017-01-10 06:13:03 +08:00
|
|
|
struct flush_cmd_control *fcc = SM_I(sbi)->fcc_info;
|
2014-05-08 17:00:35 +08:00
|
|
|
struct flush_cmd cmd;
|
2017-03-25 17:19:58 +08:00
|
|
|
int ret;
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2014-07-24 00:57:31 +08:00
|
|
|
if (test_opt(sbi, NOBARRIER))
|
|
|
|
return 0;
|
|
|
|
|
2017-03-25 17:19:58 +08:00
|
|
|
if (!test_opt(sbi, FLUSH_MERGE)) {
|
2017-09-29 13:59:38 +08:00
|
|
|
ret = submit_flush_wait(sbi, ino);
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_inc(&fcc->issued_flush);
|
|
|
|
return ret;
|
|
|
|
}
|
2015-08-15 02:43:56 +08:00
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
if (atomic_inc_return(&fcc->issing_flush) == 1 || sbi->s_ndevs > 1) {
|
|
|
|
ret = submit_flush_wait(sbi, ino);
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_dec(&fcc->issing_flush);
|
|
|
|
|
|
|
|
atomic_inc(&fcc->issued_flush);
|
2015-08-15 02:43:56 +08:00
|
|
|
return ret;
|
|
|
|
}
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
cmd.ino = ino;
|
2014-05-08 17:00:35 +08:00
|
|
|
init_completion(&cmd.wait);
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2014-09-05 18:31:00 +08:00
|
|
|
llist_add(&cmd.llnode, &fcc->issue_list);
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2017-08-21 22:53:45 +08:00
|
|
|
/* update issue_list before we wake up issue_flush thread */
|
|
|
|
smp_mb();
|
|
|
|
|
|
|
|
if (waitqueue_active(&fcc->flush_wait_queue))
|
2014-04-27 14:21:21 +08:00
|
|
|
wake_up(&fcc->flush_wait_queue);
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2016-12-08 08:23:32 +08:00
|
|
|
if (fcc->f2fs_issue_flush) {
|
|
|
|
wait_for_completion(&cmd.wait);
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_dec(&fcc->issing_flush);
|
2016-12-08 08:23:32 +08:00
|
|
|
} else {
|
2017-08-31 18:56:06 +08:00
|
|
|
struct llist_node *list;
|
|
|
|
|
|
|
|
list = llist_del_all(&fcc->issue_list);
|
|
|
|
if (!list) {
|
|
|
|
wait_for_completion(&cmd.wait);
|
|
|
|
atomic_dec(&fcc->issing_flush);
|
|
|
|
} else {
|
|
|
|
struct flush_cmd *tmp, *next;
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
ret = submit_flush_wait(sbi, ino);
|
2017-08-31 18:56:06 +08:00
|
|
|
|
|
|
|
llist_for_each_entry_safe(tmp, next, list, llnode) {
|
|
|
|
if (tmp == &cmd) {
|
|
|
|
cmd.ret = ret;
|
|
|
|
atomic_dec(&fcc->issing_flush);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
tmp->ret = ret;
|
|
|
|
complete(&tmp->wait);
|
|
|
|
}
|
|
|
|
}
|
2016-12-08 08:23:32 +08:00
|
|
|
}
|
2014-05-08 17:00:35 +08:00
|
|
|
|
|
|
|
return cmd.ret;
|
2014-04-02 14:34:36 +08:00
|
|
|
}
|
|
|
|
|
2014-04-27 14:21:33 +08:00
|
|
|
int create_flush_cmd_control(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
dev_t dev = sbi->sb->s_bdev->bd_dev;
|
|
|
|
struct flush_cmd_control *fcc;
|
|
|
|
int err = 0;
|
|
|
|
|
2017-01-10 06:13:03 +08:00
|
|
|
if (SM_I(sbi)->fcc_info) {
|
|
|
|
fcc = SM_I(sbi)->fcc_info;
|
2017-06-24 15:57:19 +08:00
|
|
|
if (fcc->f2fs_issue_flush)
|
|
|
|
return err;
|
2016-12-08 08:23:32 +08:00
|
|
|
goto init_thread;
|
|
|
|
}
|
|
|
|
|
2017-11-30 19:28:17 +08:00
|
|
|
fcc = f2fs_kzalloc(sbi, sizeof(struct flush_cmd_control), GFP_KERNEL);
|
2014-04-27 14:21:33 +08:00
|
|
|
if (!fcc)
|
|
|
|
return -ENOMEM;
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_set(&fcc->issued_flush, 0);
|
|
|
|
atomic_set(&fcc->issing_flush, 0);
|
2014-04-27 14:21:33 +08:00
|
|
|
init_waitqueue_head(&fcc->flush_wait_queue);
|
2014-09-05 18:31:00 +08:00
|
|
|
init_llist_head(&fcc->issue_list);
|
2017-01-10 06:13:03 +08:00
|
|
|
SM_I(sbi)->fcc_info = fcc;
|
2017-06-01 16:43:51 +08:00
|
|
|
if (!test_opt(sbi, FLUSH_MERGE))
|
|
|
|
return err;
|
|
|
|
|
2016-12-08 08:23:32 +08:00
|
|
|
init_thread:
|
2014-04-27 14:21:33 +08:00
|
|
|
fcc->f2fs_issue_flush = kthread_run(issue_flush_thread, sbi,
|
|
|
|
"f2fs_flush-%u:%u", MAJOR(dev), MINOR(dev));
|
|
|
|
if (IS_ERR(fcc->f2fs_issue_flush)) {
|
|
|
|
err = PTR_ERR(fcc->f2fs_issue_flush);
|
|
|
|
kfree(fcc);
|
2017-01-10 06:13:03 +08:00
|
|
|
SM_I(sbi)->fcc_info = NULL;
|
2014-04-27 14:21:33 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2016-12-08 08:23:32 +08:00
|
|
|
void destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free)
|
2014-04-27 14:21:33 +08:00
|
|
|
{
|
2017-01-10 06:13:03 +08:00
|
|
|
struct flush_cmd_control *fcc = SM_I(sbi)->fcc_info;
|
2014-04-27 14:21:33 +08:00
|
|
|
|
2016-12-08 08:23:32 +08:00
|
|
|
if (fcc && fcc->f2fs_issue_flush) {
|
|
|
|
struct task_struct *flush_thread = fcc->f2fs_issue_flush;
|
|
|
|
|
|
|
|
fcc->f2fs_issue_flush = NULL;
|
|
|
|
kthread_stop(flush_thread);
|
|
|
|
}
|
|
|
|
if (free) {
|
|
|
|
kfree(fcc);
|
2017-01-10 06:13:03 +08:00
|
|
|
SM_I(sbi)->fcc_info = NULL;
|
2016-12-08 08:23:32 +08:00
|
|
|
}
|
2014-04-27 14:21:33 +08:00
|
|
|
}
|
|
|
|
|
2017-09-29 13:59:39 +08:00
|
|
|
int f2fs_flush_device_cache(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
int ret = 0, i;
|
|
|
|
|
|
|
|
if (!sbi->s_ndevs)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
for (i = 1; i < sbi->s_ndevs; i++) {
|
|
|
|
if (!f2fs_test_bit(i, (char *)&sbi->dirty_device))
|
|
|
|
continue;
|
|
|
|
ret = __submit_flush_wait(sbi, FDEV(i).bdev);
|
|
|
|
if (ret)
|
|
|
|
break;
|
|
|
|
|
|
|
|
spin_lock(&sbi->dev_lock);
|
|
|
|
f2fs_clear_bit(i, (char *)&sbi->dirty_device);
|
|
|
|
spin_unlock(&sbi->dev_lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
static void __locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno,
|
|
|
|
enum dirty_type dirty_type)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
|
|
|
|
/* need not be added */
|
|
|
|
if (IS_CURSEG(sbi, segno))
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (!test_and_set_bit(segno, dirty_i->dirty_segmap[dirty_type]))
|
|
|
|
dirty_i->nr_dirty[dirty_type]++;
|
|
|
|
|
|
|
|
if (dirty_type == DIRTY) {
|
|
|
|
struct seg_entry *sentry = get_seg_entry(sbi, segno);
|
2013-10-25 16:31:57 +08:00
|
|
|
enum dirty_type t = sentry->type;
|
f2fs: fix the bitmap consistency of dirty segments
Like below, there are 8 segment bitmaps for SSR victim candidates.
enum dirty_type {
DIRTY_HOT_DATA, /* dirty segments assigned as hot data logs */
DIRTY_WARM_DATA, /* dirty segments assigned as warm data logs */
DIRTY_COLD_DATA, /* dirty segments assigned as cold data logs */
DIRTY_HOT_NODE, /* dirty segments assigned as hot node logs */
DIRTY_WARM_NODE, /* dirty segments assigned as warm node logs */
DIRTY_COLD_NODE, /* dirty segments assigned as cold node logs */
DIRTY, /* to count # of dirty segments */
PRE, /* to count # of entirely obsolete segments */
NR_DIRTY_TYPE
};
The upper 6 bitmaps indicates segments dirtied by active log areas respectively.
And, the DIRTY bitmap integrates all the 6 bitmaps.
For example,
o DIRTY_HOT_DATA : 1010000
o DIRTY_WARM_DATA: 0100000
o DIRTY_COLD_DATA: 0001000
o DIRTY_HOT_NODE : 0000010
o DIRTY_WARM_NODE: 0000001
o DIRTY_COLD_NODE: 0000000
In this case,
o DIRTY : 1111011,
which means that we should guarantee the consistency between DIRTY and other
bitmaps concreately.
However, the SSR mode selects victims freely from any log types, which can set
multiple bits across the various bitmap types.
So, this patch eliminates this inconsistency.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-01 12:52:09 +08:00
|
|
|
|
2014-09-03 07:24:11 +08:00
|
|
|
if (unlikely(t >= DIRTY)) {
|
|
|
|
f2fs_bug_on(sbi, 1);
|
|
|
|
return;
|
|
|
|
}
|
2013-10-25 16:31:57 +08:00
|
|
|
if (!test_and_set_bit(segno, dirty_i->dirty_segmap[t]))
|
|
|
|
dirty_i->nr_dirty[t]++;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __remove_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno,
|
|
|
|
enum dirty_type dirty_type)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
|
|
|
|
if (test_and_clear_bit(segno, dirty_i->dirty_segmap[dirty_type]))
|
|
|
|
dirty_i->nr_dirty[dirty_type]--;
|
|
|
|
|
|
|
|
if (dirty_type == DIRTY) {
|
2013-10-25 16:31:57 +08:00
|
|
|
struct seg_entry *sentry = get_seg_entry(sbi, segno);
|
|
|
|
enum dirty_type t = sentry->type;
|
|
|
|
|
|
|
|
if (test_and_clear_bit(segno, dirty_i->dirty_segmap[t]))
|
|
|
|
dirty_i->nr_dirty[t]--;
|
f2fs: fix the bitmap consistency of dirty segments
Like below, there are 8 segment bitmaps for SSR victim candidates.
enum dirty_type {
DIRTY_HOT_DATA, /* dirty segments assigned as hot data logs */
DIRTY_WARM_DATA, /* dirty segments assigned as warm data logs */
DIRTY_COLD_DATA, /* dirty segments assigned as cold data logs */
DIRTY_HOT_NODE, /* dirty segments assigned as hot node logs */
DIRTY_WARM_NODE, /* dirty segments assigned as warm node logs */
DIRTY_COLD_NODE, /* dirty segments assigned as cold node logs */
DIRTY, /* to count # of dirty segments */
PRE, /* to count # of entirely obsolete segments */
NR_DIRTY_TYPE
};
The upper 6 bitmaps indicates segments dirtied by active log areas respectively.
And, the DIRTY bitmap integrates all the 6 bitmaps.
For example,
o DIRTY_HOT_DATA : 1010000
o DIRTY_WARM_DATA: 0100000
o DIRTY_COLD_DATA: 0001000
o DIRTY_HOT_NODE : 0000010
o DIRTY_WARM_NODE: 0000001
o DIRTY_COLD_NODE: 0000000
In this case,
o DIRTY : 1111011,
which means that we should guarantee the consistency between DIRTY and other
bitmaps concreately.
However, the SSR mode selects victims freely from any log types, which can set
multiple bits across the various bitmap types.
So, this patch eliminates this inconsistency.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-01 12:52:09 +08:00
|
|
|
|
2017-04-08 05:33:22 +08:00
|
|
|
if (get_valid_blocks(sbi, segno, true) == 0)
|
2017-04-08 06:08:17 +08:00
|
|
|
clear_bit(GET_SEC_FROM_SEG(sbi, segno),
|
2013-03-31 12:26:03 +08:00
|
|
|
dirty_i->victim_secmap);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Should not occur error such as -ENOMEM.
|
|
|
|
* Adding dirty entry into seglist is not critical operation.
|
|
|
|
* If a given segment is one of current working segments, it won't be added.
|
|
|
|
*/
|
2013-06-13 16:59:28 +08:00
|
|
|
static void locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
unsigned short valid_blocks;
|
|
|
|
|
|
|
|
if (segno == NULL_SEGNO || IS_CURSEG(sbi, segno))
|
|
|
|
return;
|
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
|
|
|
|
2017-04-08 05:33:22 +08:00
|
|
|
valid_blocks = get_valid_blocks(sbi, segno, false);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
if (valid_blocks == 0) {
|
|
|
|
__locate_dirty_segment(sbi, segno, PRE);
|
|
|
|
__remove_dirty_segment(sbi, segno, DIRTY);
|
|
|
|
} else if (valid_blocks < sbi->blocks_per_seg) {
|
|
|
|
__locate_dirty_segment(sbi, segno, DIRTY);
|
|
|
|
} else {
|
|
|
|
/* Recovery routine with SSR needs this */
|
|
|
|
__remove_dirty_segment(sbi, segno, DIRTY);
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
}
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
static struct discard_cmd *__create_discard_cmd(struct f2fs_sb_info *sbi,
|
2017-03-08 10:02:02 +08:00
|
|
|
struct block_device *bdev, block_t lstart,
|
|
|
|
block_t start, block_t len)
|
2016-08-29 23:58:34 +08:00
|
|
|
{
|
2017-01-12 06:40:24 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-04-15 14:09:37 +08:00
|
|
|
struct list_head *pend_list;
|
2017-01-10 06:13:03 +08:00
|
|
|
struct discard_cmd *dc;
|
2016-08-29 23:58:34 +08:00
|
|
|
|
2017-04-15 14:09:37 +08:00
|
|
|
f2fs_bug_on(sbi, !len);
|
|
|
|
|
|
|
|
pend_list = &dcc->pend_list[plist_idx(len)];
|
|
|
|
|
2017-01-10 06:13:03 +08:00
|
|
|
dc = f2fs_kmem_cache_alloc(discard_cmd_slab, GFP_NOFS);
|
|
|
|
INIT_LIST_HEAD(&dc->list);
|
2017-03-08 10:02:02 +08:00
|
|
|
dc->bdev = bdev;
|
2017-01-10 06:13:03 +08:00
|
|
|
dc->lstart = lstart;
|
2017-03-08 10:02:02 +08:00
|
|
|
dc->start = start;
|
2017-01-10 06:13:03 +08:00
|
|
|
dc->len = len;
|
2017-04-26 17:39:54 +08:00
|
|
|
dc->ref = 0;
|
2017-01-10 12:32:07 +08:00
|
|
|
dc->state = D_PREP;
|
2017-03-08 10:02:02 +08:00
|
|
|
dc->error = 0;
|
2017-01-10 06:13:03 +08:00
|
|
|
init_completion(&dc->wait);
|
2017-04-05 18:19:48 +08:00
|
|
|
list_add_tail(&dc->list, pend_list);
|
2017-03-25 17:19:59 +08:00
|
|
|
atomic_inc(&dcc->discard_cmd_cnt);
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks += len;
|
2017-04-14 23:24:55 +08:00
|
|
|
|
|
|
|
return dc;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct discard_cmd *__attach_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t lstart,
|
|
|
|
block_t start, block_t len,
|
|
|
|
struct rb_node *parent, struct rb_node **p)
|
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct discard_cmd *dc;
|
|
|
|
|
|
|
|
dc = __create_discard_cmd(sbi, bdev, lstart, start, len);
|
|
|
|
|
|
|
|
rb_link_node(&dc->rb_node, parent, p);
|
|
|
|
rb_insert_color(&dc->rb_node, &dcc->root);
|
|
|
|
|
|
|
|
return dc;
|
2017-01-10 12:32:07 +08:00
|
|
|
}
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
static void __detach_discard_cmd(struct discard_cmd_control *dcc,
|
|
|
|
struct discard_cmd *dc)
|
2017-01-10 12:32:07 +08:00
|
|
|
{
|
2017-01-12 02:20:04 +08:00
|
|
|
if (dc->state == D_DONE)
|
2017-04-14 23:24:55 +08:00
|
|
|
atomic_dec(&dcc->issing_discard);
|
|
|
|
|
|
|
|
list_del(&dc->list);
|
|
|
|
rb_erase(&dc->rb_node, &dcc->root);
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks -= dc->len;
|
2017-04-14 23:24:55 +08:00
|
|
|
|
|
|
|
kmem_cache_free(discard_cmd_slab, dc);
|
|
|
|
|
|
|
|
atomic_dec(&dcc->discard_cmd_cnt);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __remove_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_cmd *dc)
|
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-01-12 02:20:04 +08:00
|
|
|
|
2017-10-04 09:08:36 +08:00
|
|
|
trace_f2fs_remove_discard(dc->bdev, dc->start, dc->len);
|
|
|
|
|
2017-06-05 18:29:07 +08:00
|
|
|
f2fs_bug_on(sbi, dc->ref);
|
|
|
|
|
2017-03-08 10:02:02 +08:00
|
|
|
if (dc->error == -EOPNOTSUPP)
|
|
|
|
dc->error = 0;
|
2017-01-10 12:32:07 +08:00
|
|
|
|
2017-03-08 10:02:02 +08:00
|
|
|
if (dc->error)
|
2017-01-10 12:32:07 +08:00
|
|
|
f2fs_msg(sbi->sb, KERN_INFO,
|
2017-05-19 23:46:43 +08:00
|
|
|
"Issue discard(%u, %u, %u) failed, ret: %d",
|
|
|
|
dc->lstart, dc->start, dc->len, dc->error);
|
2017-04-14 23:24:55 +08:00
|
|
|
__detach_discard_cmd(dcc, dc);
|
2016-08-29 23:58:34 +08:00
|
|
|
}
|
|
|
|
|
2017-03-08 10:02:02 +08:00
|
|
|
static void f2fs_submit_discard_endio(struct bio *bio)
|
|
|
|
{
|
|
|
|
struct discard_cmd *dc = (struct discard_cmd *)bio->bi_private;
|
|
|
|
|
2017-06-03 15:38:06 +08:00
|
|
|
dc->error = blk_status_to_errno(bio->bi_status);
|
2017-03-08 10:02:02 +08:00
|
|
|
dc->state = D_DONE;
|
2017-05-19 23:46:44 +08:00
|
|
|
complete_all(&dc->wait);
|
2017-03-08 10:02:02 +08:00
|
|
|
bio_put(bio);
|
|
|
|
}
|
|
|
|
|
2018-01-05 17:41:20 +08:00
|
|
|
static void __check_sit_bitmap(struct f2fs_sb_info *sbi,
|
2017-06-30 17:19:02 +08:00
|
|
|
block_t start, block_t end)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
struct seg_entry *sentry;
|
|
|
|
unsigned int segno;
|
|
|
|
block_t blk = start;
|
|
|
|
unsigned long offset, size, max_blocks = sbi->blocks_per_seg;
|
|
|
|
unsigned long *map;
|
|
|
|
|
|
|
|
while (blk < end) {
|
|
|
|
segno = GET_SEGNO(sbi, blk);
|
|
|
|
sentry = get_seg_entry(sbi, segno);
|
|
|
|
offset = GET_BLKOFF_FROM_SEG0(sbi, blk);
|
|
|
|
|
2017-08-04 17:07:15 +08:00
|
|
|
if (end < START_BLOCK(sbi, segno + 1))
|
|
|
|
size = GET_BLKOFF_FROM_SEG0(sbi, end);
|
|
|
|
else
|
|
|
|
size = max_blocks;
|
2017-06-30 17:19:02 +08:00
|
|
|
map = (unsigned long *)(sentry->cur_valid_map);
|
|
|
|
offset = __find_rev_next_bit(map, size, offset);
|
|
|
|
f2fs_bug_on(sbi, offset != size);
|
2017-08-04 17:07:15 +08:00
|
|
|
blk = START_BLOCK(sbi, segno + 1);
|
2017-06-30 17:19:02 +08:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2017-03-08 10:02:02 +08:00
|
|
|
/* this function is copied from blkdev_issue_discard from block/blk-lib.c */
|
|
|
|
static void __submit_discard_cmd(struct f2fs_sb_info *sbi,
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy *dpolicy,
|
|
|
|
struct discard_cmd *dc)
|
2017-03-08 10:02:02 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-10-04 09:08:34 +08:00
|
|
|
struct list_head *wait_list = (dpolicy->type == DPOLICY_FSTRIM) ?
|
|
|
|
&(dcc->fstrim_list) : &(dcc->wait_list);
|
2017-03-08 10:02:02 +08:00
|
|
|
struct bio *bio = NULL;
|
2017-10-04 09:08:34 +08:00
|
|
|
int flag = dpolicy->sync ? REQ_SYNC : 0;
|
2017-03-08 10:02:02 +08:00
|
|
|
|
|
|
|
if (dc->state != D_PREP)
|
|
|
|
return;
|
|
|
|
|
2017-04-15 14:09:38 +08:00
|
|
|
trace_f2fs_issue_discard(dc->bdev, dc->start, dc->len);
|
|
|
|
|
2017-03-08 10:02:02 +08:00
|
|
|
dc->error = __blkdev_issue_discard(dc->bdev,
|
|
|
|
SECTOR_FROM_BLOCK(dc->start),
|
|
|
|
SECTOR_FROM_BLOCK(dc->len),
|
|
|
|
GFP_NOFS, 0, &bio);
|
|
|
|
if (!dc->error) {
|
|
|
|
/* should keep before submission to avoid D_DONE right away */
|
|
|
|
dc->state = D_SUBMIT;
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_inc(&dcc->issued_discard);
|
|
|
|
atomic_inc(&dcc->issing_discard);
|
2017-03-08 10:02:02 +08:00
|
|
|
if (bio) {
|
|
|
|
bio->bi_private = dc;
|
|
|
|
bio->bi_end_io = f2fs_submit_discard_endio;
|
2017-10-04 09:08:33 +08:00
|
|
|
bio->bi_opf |= flag;
|
2017-03-08 10:02:02 +08:00
|
|
|
submit_bio(bio);
|
2017-10-04 09:08:32 +08:00
|
|
|
list_move_tail(&dc->list, wait_list);
|
2017-06-30 17:19:02 +08:00
|
|
|
__check_sit_bitmap(sbi, dc->start, dc->start + dc->len);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
|
|
|
f2fs_update_iostat(sbi, FS_DISCARD, 1);
|
2017-03-08 10:02:02 +08:00
|
|
|
}
|
|
|
|
} else {
|
|
|
|
__remove_discard_cmd(sbi, dc);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
static struct discard_cmd *__insert_discard_tree(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t lstart,
|
|
|
|
block_t start, block_t len,
|
|
|
|
struct rb_node **insert_p,
|
|
|
|
struct rb_node *insert_parent)
|
2017-03-08 10:02:02 +08:00
|
|
|
{
|
2017-04-14 23:24:55 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-10-19 18:58:21 +08:00
|
|
|
struct rb_node **p;
|
2017-04-14 23:24:55 +08:00
|
|
|
struct rb_node *parent = NULL;
|
|
|
|
struct discard_cmd *dc = NULL;
|
|
|
|
|
|
|
|
if (insert_p && insert_parent) {
|
|
|
|
parent = insert_parent;
|
|
|
|
p = insert_p;
|
|
|
|
goto do_insert;
|
|
|
|
}
|
2017-03-08 10:02:02 +08:00
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
p = __lookup_rb_tree_for_insert(sbi, &dcc->root, &parent, lstart);
|
|
|
|
do_insert:
|
|
|
|
dc = __attach_discard_cmd(sbi, bdev, lstart, start, len, parent, p);
|
|
|
|
if (!dc)
|
|
|
|
return NULL;
|
2017-03-08 10:02:02 +08:00
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
return dc;
|
2017-03-08 10:02:02 +08:00
|
|
|
}
|
|
|
|
|
2017-04-15 14:09:37 +08:00
|
|
|
static void __relocate_discard_cmd(struct discard_cmd_control *dcc,
|
|
|
|
struct discard_cmd *dc)
|
|
|
|
{
|
|
|
|
list_move_tail(&dc->list, &dcc->pend_list[plist_idx(dc->len)]);
|
|
|
|
}
|
|
|
|
|
2017-03-02 10:36:20 +08:00
|
|
|
static void __punch_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_cmd *dc, block_t blkaddr)
|
|
|
|
{
|
2017-04-15 14:09:37 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-04-14 23:24:55 +08:00
|
|
|
struct discard_info di = dc->di;
|
|
|
|
bool modified = false;
|
2017-03-02 10:36:20 +08:00
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
if (dc->state == D_DONE || dc->len == 1) {
|
2017-03-02 10:36:20 +08:00
|
|
|
__remove_discard_cmd(sbi, dc);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks -= di.len;
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
if (blkaddr > di.lstart) {
|
2017-03-02 10:36:20 +08:00
|
|
|
dc->len = blkaddr - dc->lstart;
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks += dc->len;
|
2017-04-15 14:09:37 +08:00
|
|
|
__relocate_discard_cmd(dcc, dc);
|
2017-04-14 23:24:55 +08:00
|
|
|
modified = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (blkaddr < di.lstart + di.len - 1) {
|
|
|
|
if (modified) {
|
|
|
|
__insert_discard_tree(sbi, dc->bdev, blkaddr + 1,
|
|
|
|
di.start + blkaddr + 1 - di.lstart,
|
|
|
|
di.lstart + di.len - 1 - blkaddr,
|
|
|
|
NULL, NULL);
|
|
|
|
} else {
|
|
|
|
dc->lstart++;
|
|
|
|
dc->len--;
|
|
|
|
dc->start++;
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks += dc->len;
|
2017-04-15 14:09:37 +08:00
|
|
|
__relocate_discard_cmd(dcc, dc);
|
2017-04-14 23:24:55 +08:00
|
|
|
}
|
2017-03-02 10:36:20 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
static void __update_discard_tree_range(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t lstart,
|
|
|
|
block_t start, block_t len)
|
2016-08-29 23:58:34 +08:00
|
|
|
{
|
2017-01-12 06:40:24 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-04-14 23:24:55 +08:00
|
|
|
struct discard_cmd *prev_dc = NULL, *next_dc = NULL;
|
|
|
|
struct discard_cmd *dc;
|
|
|
|
struct discard_info di = {0};
|
|
|
|
struct rb_node **insert_p = NULL, *insert_parent = NULL;
|
|
|
|
block_t end = lstart + len;
|
2016-08-29 23:58:34 +08:00
|
|
|
|
2017-01-10 12:32:07 +08:00
|
|
|
mutex_lock(&dcc->cmd_lock);
|
2017-02-23 11:58:23 +08:00
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
dc = (struct discard_cmd *)__lookup_rb_tree_ret(&dcc->root,
|
|
|
|
NULL, lstart,
|
|
|
|
(struct rb_entry **)&prev_dc,
|
|
|
|
(struct rb_entry **)&next_dc,
|
|
|
|
&insert_p, &insert_parent, true);
|
|
|
|
if (dc)
|
|
|
|
prev_dc = dc;
|
|
|
|
|
|
|
|
if (!prev_dc) {
|
|
|
|
di.lstart = lstart;
|
|
|
|
di.len = next_dc ? next_dc->lstart - lstart : len;
|
|
|
|
di.len = min(di.len, len);
|
|
|
|
di.start = start;
|
2017-04-05 18:19:48 +08:00
|
|
|
}
|
2017-01-10 12:32:07 +08:00
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
while (1) {
|
|
|
|
struct rb_node *node;
|
|
|
|
bool merged = false;
|
|
|
|
struct discard_cmd *tdc = NULL;
|
|
|
|
|
|
|
|
if (prev_dc) {
|
|
|
|
di.lstart = prev_dc->lstart + prev_dc->len;
|
|
|
|
if (di.lstart < lstart)
|
|
|
|
di.lstart = lstart;
|
|
|
|
if (di.lstart >= end)
|
|
|
|
break;
|
|
|
|
|
|
|
|
if (!next_dc || next_dc->lstart > end)
|
|
|
|
di.len = end - di.lstart;
|
|
|
|
else
|
|
|
|
di.len = next_dc->lstart - di.lstart;
|
|
|
|
di.start = start + di.lstart - lstart;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!di.len)
|
|
|
|
goto next;
|
|
|
|
|
|
|
|
if (prev_dc && prev_dc->state == D_PREP &&
|
|
|
|
prev_dc->bdev == bdev &&
|
|
|
|
__is_discard_back_mergeable(&di, &prev_dc->di)) {
|
|
|
|
prev_dc->di.len += di.len;
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks += di.len;
|
2017-04-15 14:09:37 +08:00
|
|
|
__relocate_discard_cmd(dcc, prev_dc);
|
2017-04-14 23:24:55 +08:00
|
|
|
di = prev_dc->di;
|
|
|
|
tdc = prev_dc;
|
|
|
|
merged = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (next_dc && next_dc->state == D_PREP &&
|
|
|
|
next_dc->bdev == bdev &&
|
|
|
|
__is_discard_front_mergeable(&di, &next_dc->di)) {
|
|
|
|
next_dc->di.lstart = di.lstart;
|
|
|
|
next_dc->di.len += di.len;
|
|
|
|
next_dc->di.start = di.start;
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks += di.len;
|
2017-04-15 14:09:37 +08:00
|
|
|
__relocate_discard_cmd(dcc, next_dc);
|
2017-04-14 23:24:55 +08:00
|
|
|
if (tdc)
|
|
|
|
__remove_discard_cmd(sbi, tdc);
|
|
|
|
merged = true;
|
2016-12-30 06:07:53 +08:00
|
|
|
}
|
2017-04-14 23:24:55 +08:00
|
|
|
|
2017-04-17 18:21:43 +08:00
|
|
|
if (!merged) {
|
2017-04-14 23:24:55 +08:00
|
|
|
__insert_discard_tree(sbi, bdev, di.lstart, di.start,
|
|
|
|
di.len, NULL, NULL);
|
2017-04-17 18:21:43 +08:00
|
|
|
}
|
2017-04-14 23:24:55 +08:00
|
|
|
next:
|
|
|
|
prev_dc = next_dc;
|
|
|
|
if (!prev_dc)
|
|
|
|
break;
|
|
|
|
|
|
|
|
node = rb_next(&prev_dc->rb_node);
|
|
|
|
next_dc = rb_entry_safe(node, struct discard_cmd, rb_node);
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int __queue_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t blkstart, block_t blklen)
|
|
|
|
{
|
|
|
|
block_t lblkstart = blkstart;
|
|
|
|
|
2017-04-15 14:09:38 +08:00
|
|
|
trace_f2fs_queue_discard(bdev, blkstart, blklen);
|
2017-04-14 23:24:55 +08:00
|
|
|
|
|
|
|
if (sbi->s_ndevs) {
|
|
|
|
int devi = f2fs_target_device_index(sbi, blkstart);
|
|
|
|
|
|
|
|
blkstart -= FDEV(devi).start_blk;
|
|
|
|
}
|
|
|
|
__update_discard_tree_range(sbi, bdev, lblkstart, blkstart, blklen);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-10-04 09:08:32 +08:00
|
|
|
static void __issue_discard_cmd_range(struct f2fs_sb_info *sbi,
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy *dpolicy,
|
|
|
|
unsigned int start, unsigned int end)
|
2017-10-04 09:08:32 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct discard_cmd *prev_dc = NULL, *next_dc = NULL;
|
|
|
|
struct rb_node **insert_p = NULL, *insert_parent = NULL;
|
|
|
|
struct discard_cmd *dc;
|
|
|
|
struct blk_plug plug;
|
|
|
|
int issued;
|
|
|
|
|
|
|
|
next:
|
|
|
|
issued = 0;
|
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
|
|
|
f2fs_bug_on(sbi, !__check_rb_tree_consistence(sbi, &dcc->root));
|
|
|
|
|
|
|
|
dc = (struct discard_cmd *)__lookup_rb_tree_ret(&dcc->root,
|
|
|
|
NULL, start,
|
|
|
|
(struct rb_entry **)&prev_dc,
|
|
|
|
(struct rb_entry **)&next_dc,
|
|
|
|
&insert_p, &insert_parent, true);
|
|
|
|
if (!dc)
|
|
|
|
dc = next_dc;
|
|
|
|
|
|
|
|
blk_start_plug(&plug);
|
|
|
|
|
|
|
|
while (dc && dc->lstart <= end) {
|
|
|
|
struct rb_node *node;
|
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
if (dc->len < dpolicy->granularity)
|
2017-10-04 09:08:32 +08:00
|
|
|
goto skip;
|
|
|
|
|
|
|
|
if (dc->state != D_PREP) {
|
|
|
|
list_move_tail(&dc->list, &dcc->fstrim_list);
|
|
|
|
goto skip;
|
|
|
|
}
|
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
__submit_discard_cmd(sbi, dpolicy, dc);
|
2017-10-04 09:08:32 +08:00
|
|
|
|
2017-10-04 09:08:33 +08:00
|
|
|
if (++issued >= dpolicy->max_requests) {
|
2017-10-04 09:08:32 +08:00
|
|
|
start = dc->lstart + dc->len;
|
|
|
|
|
|
|
|
blk_finish_plug(&plug);
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
|
|
|
|
|
|
|
schedule();
|
|
|
|
|
|
|
|
goto next;
|
|
|
|
}
|
|
|
|
skip:
|
|
|
|
node = rb_next(&dc->rb_node);
|
|
|
|
dc = rb_entry_safe(node, struct discard_cmd, rb_node);
|
|
|
|
|
|
|
|
if (fatal_signal_pending(current))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
blk_finish_plug(&plug);
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
|
|
|
}
|
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
static int __issue_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_policy *dpolicy)
|
2017-04-25 20:21:37 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct list_head *pend_list;
|
|
|
|
struct discard_cmd *dc, *tmp;
|
|
|
|
struct blk_plug plug;
|
2017-10-04 09:08:34 +08:00
|
|
|
int i, iter = 0, issued = 0;
|
2017-09-12 21:35:12 +08:00
|
|
|
bool io_interrupted = false;
|
2017-04-25 20:21:37 +08:00
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
for (i = MAX_PLIST_NUM - 1; i >= 0; i--) {
|
|
|
|
if (i + 1 < dpolicy->granularity)
|
|
|
|
break;
|
2017-04-25 20:21:37 +08:00
|
|
|
pend_list = &dcc->pend_list[i];
|
2017-10-04 09:08:35 +08:00
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
2018-01-08 18:48:33 +08:00
|
|
|
if (list_empty(pend_list))
|
|
|
|
goto next;
|
2017-10-04 09:08:35 +08:00
|
|
|
f2fs_bug_on(sbi, !__check_rb_tree_consistence(sbi, &dcc->root));
|
|
|
|
blk_start_plug(&plug);
|
2017-04-25 20:21:37 +08:00
|
|
|
list_for_each_entry_safe(dc, tmp, pend_list, list) {
|
|
|
|
f2fs_bug_on(sbi, dc->state != D_PREP);
|
|
|
|
|
2017-10-04 09:08:33 +08:00
|
|
|
if (dpolicy->io_aware && i < dpolicy->io_aware_gran &&
|
|
|
|
!is_idle(sbi)) {
|
2017-09-12 21:35:12 +08:00
|
|
|
io_interrupted = true;
|
2017-10-04 09:08:33 +08:00
|
|
|
goto skip;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
}
|
2017-09-12 21:35:12 +08:00
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
__submit_discard_cmd(sbi, dpolicy, dc);
|
2017-10-04 09:08:33 +08:00
|
|
|
issued++;
|
|
|
|
skip:
|
|
|
|
if (++iter >= dpolicy->max_requests)
|
2017-10-04 09:08:35 +08:00
|
|
|
break;
|
2017-04-25 20:21:37 +08:00
|
|
|
}
|
2017-10-04 09:08:35 +08:00
|
|
|
blk_finish_plug(&plug);
|
2018-01-08 18:48:33 +08:00
|
|
|
next:
|
2017-10-04 09:08:35 +08:00
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
|
|
|
|
|
|
|
if (iter >= dpolicy->max_requests)
|
|
|
|
break;
|
2017-04-25 20:21:37 +08:00
|
|
|
}
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
|
2017-09-12 21:35:12 +08:00
|
|
|
if (!issued && io_interrupted)
|
|
|
|
issued = -1;
|
|
|
|
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
return issued;
|
|
|
|
}
|
|
|
|
|
2017-10-04 09:08:37 +08:00
|
|
|
static bool __drop_discard_cmd(struct f2fs_sb_info *sbi)
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct list_head *pend_list;
|
|
|
|
struct discard_cmd *dc, *tmp;
|
|
|
|
int i;
|
2017-10-04 09:08:37 +08:00
|
|
|
bool dropped = false;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
|
|
|
for (i = MAX_PLIST_NUM - 1; i >= 0; i--) {
|
|
|
|
pend_list = &dcc->pend_list[i];
|
|
|
|
list_for_each_entry_safe(dc, tmp, pend_list, list) {
|
|
|
|
f2fs_bug_on(sbi, dc->state != D_PREP);
|
|
|
|
__remove_discard_cmd(sbi, dc);
|
2017-10-04 09:08:37 +08:00
|
|
|
dropped = true;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2017-10-04 09:08:37 +08:00
|
|
|
|
|
|
|
return dropped;
|
2017-04-25 20:21:37 +08:00
|
|
|
}
|
|
|
|
|
2018-01-18 17:23:29 +08:00
|
|
|
void drop_discard_cmd(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
__drop_discard_cmd(sbi);
|
|
|
|
}
|
|
|
|
|
2017-10-28 16:52:32 +08:00
|
|
|
static unsigned int __wait_one_discard_bio(struct f2fs_sb_info *sbi,
|
2017-06-05 18:29:06 +08:00
|
|
|
struct discard_cmd *dc)
|
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-10-28 16:52:32 +08:00
|
|
|
unsigned int len = 0;
|
2017-06-05 18:29:06 +08:00
|
|
|
|
|
|
|
wait_for_completion_io(&dc->wait);
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
|
|
|
f2fs_bug_on(sbi, dc->state != D_DONE);
|
|
|
|
dc->ref--;
|
2017-10-28 16:52:32 +08:00
|
|
|
if (!dc->ref) {
|
|
|
|
if (!dc->error)
|
|
|
|
len = dc->len;
|
2017-06-05 18:29:06 +08:00
|
|
|
__remove_discard_cmd(sbi, dc);
|
2017-10-28 16:52:32 +08:00
|
|
|
}
|
2017-06-05 18:29:06 +08:00
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2017-10-28 16:52:32 +08:00
|
|
|
|
|
|
|
return len;
|
2017-06-05 18:29:06 +08:00
|
|
|
}
|
|
|
|
|
2017-10-28 16:52:32 +08:00
|
|
|
static unsigned int __wait_discard_cmd_range(struct f2fs_sb_info *sbi,
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy *dpolicy,
|
|
|
|
block_t start, block_t end)
|
2017-04-25 20:21:38 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-10-04 09:08:34 +08:00
|
|
|
struct list_head *wait_list = (dpolicy->type == DPOLICY_FSTRIM) ?
|
|
|
|
&(dcc->fstrim_list) : &(dcc->wait_list);
|
2017-04-25 20:21:38 +08:00
|
|
|
struct discard_cmd *dc, *tmp;
|
2017-05-19 23:46:45 +08:00
|
|
|
bool need_wait;
|
2017-10-28 16:52:32 +08:00
|
|
|
unsigned int trimmed = 0;
|
2017-05-19 23:46:45 +08:00
|
|
|
|
|
|
|
next:
|
|
|
|
need_wait = false;
|
2017-04-25 20:21:38 +08:00
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
|
|
|
list_for_each_entry_safe(dc, tmp, wait_list, list) {
|
2017-10-04 09:08:32 +08:00
|
|
|
if (dc->lstart + dc->len <= start || end <= dc->lstart)
|
|
|
|
continue;
|
2017-10-04 09:08:34 +08:00
|
|
|
if (dc->len < dpolicy->granularity)
|
2017-10-04 09:08:32 +08:00
|
|
|
continue;
|
2017-10-04 09:08:34 +08:00
|
|
|
if (dc->state == D_DONE && !dc->ref) {
|
2017-04-25 20:21:38 +08:00
|
|
|
wait_for_completion_io(&dc->wait);
|
2017-10-28 16:52:32 +08:00
|
|
|
if (!dc->error)
|
|
|
|
trimmed += dc->len;
|
2017-04-25 20:21:38 +08:00
|
|
|
__remove_discard_cmd(sbi, dc);
|
2017-05-19 23:46:45 +08:00
|
|
|
} else {
|
|
|
|
dc->ref++;
|
|
|
|
need_wait = true;
|
|
|
|
break;
|
2017-04-25 20:21:38 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2017-05-19 23:46:45 +08:00
|
|
|
|
|
|
|
if (need_wait) {
|
2017-10-28 16:52:32 +08:00
|
|
|
trimmed += __wait_one_discard_bio(sbi, dc);
|
2017-05-19 23:46:45 +08:00
|
|
|
goto next;
|
|
|
|
}
|
2017-10-28 16:52:32 +08:00
|
|
|
|
|
|
|
return trimmed;
|
2017-04-25 20:21:38 +08:00
|
|
|
}
|
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
static void __wait_all_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_policy *dpolicy)
|
2017-10-04 09:08:32 +08:00
|
|
|
{
|
2017-10-04 09:08:34 +08:00
|
|
|
__wait_discard_cmd_range(sbi, dpolicy, 0, UINT_MAX);
|
2017-10-04 09:08:32 +08:00
|
|
|
}
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
/* This should be covered by global mutex, &sit_i->sentry_lock */
|
2018-01-05 17:41:20 +08:00
|
|
|
static void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, block_t blkaddr)
|
2017-04-14 23:24:55 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct discard_cmd *dc;
|
2017-04-26 17:39:54 +08:00
|
|
|
bool need_wait = false;
|
2017-04-14 23:24:55 +08:00
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
|
|
|
dc = (struct discard_cmd *)__lookup_rb_tree(&dcc->root, NULL, blkaddr);
|
|
|
|
if (dc) {
|
2017-04-26 17:39:54 +08:00
|
|
|
if (dc->state == D_PREP) {
|
|
|
|
__punch_discard_cmd(sbi, dc, blkaddr);
|
|
|
|
} else {
|
|
|
|
dc->ref++;
|
|
|
|
need_wait = true;
|
|
|
|
}
|
2016-08-29 23:58:34 +08:00
|
|
|
}
|
2017-04-05 18:19:49 +08:00
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2017-04-26 17:39:54 +08:00
|
|
|
|
2017-06-05 18:29:06 +08:00
|
|
|
if (need_wait)
|
|
|
|
__wait_one_discard_bio(sbi, dc);
|
2017-04-05 18:19:49 +08:00
|
|
|
}
|
|
|
|
|
2017-06-29 23:17:45 +08:00
|
|
|
void stop_discard_thread(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
|
|
|
|
if (dcc && dcc->f2fs_issue_discard) {
|
|
|
|
struct task_struct *discard_thread = dcc->f2fs_issue_discard;
|
|
|
|
|
|
|
|
dcc->f2fs_issue_discard = NULL;
|
|
|
|
kthread_stop(discard_thread);
|
2017-04-26 17:39:54 +08:00
|
|
|
}
|
2017-04-05 18:19:49 +08:00
|
|
|
}
|
|
|
|
|
2017-10-04 09:08:32 +08:00
|
|
|
/* This comes from f2fs_put_super */
|
2017-10-04 09:08:37 +08:00
|
|
|
bool f2fs_wait_discard_bios(struct f2fs_sb_info *sbi)
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy dpolicy;
|
2017-10-04 09:08:37 +08:00
|
|
|
bool dropped;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
init_discard_policy(&dpolicy, DPOLICY_UMOUNT, dcc->discard_granularity);
|
|
|
|
__issue_discard_cmd(sbi, &dpolicy);
|
2017-10-04 09:08:37 +08:00
|
|
|
dropped = __drop_discard_cmd(sbi);
|
2017-10-04 09:08:34 +08:00
|
|
|
__wait_all_discard_cmd(sbi, &dpolicy);
|
2017-10-04 09:08:37 +08:00
|
|
|
|
|
|
|
return dropped;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
}
|
|
|
|
|
2017-01-10 12:32:07 +08:00
|
|
|
static int issue_discard_thread(void *data)
|
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = data;
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
wait_queue_head_t *q = &dcc->discard_wait_queue;
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy dpolicy;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
unsigned int wait_ms = DEF_MIN_DISCARD_ISSUE_TIME;
|
|
|
|
int issued;
|
2017-01-10 12:32:07 +08:00
|
|
|
|
2017-05-18 01:36:58 +08:00
|
|
|
set_freezable();
|
2017-01-10 12:32:07 +08:00
|
|
|
|
2017-05-18 01:36:58 +08:00
|
|
|
do {
|
2017-10-04 09:08:34 +08:00
|
|
|
init_discard_policy(&dpolicy, DPOLICY_BG,
|
|
|
|
dcc->discard_granularity);
|
|
|
|
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
wait_event_interruptible_timeout(*q,
|
|
|
|
kthread_should_stop() || freezing(current) ||
|
|
|
|
dcc->discard_wake,
|
|
|
|
msecs_to_jiffies(wait_ms));
|
2017-05-18 01:36:58 +08:00
|
|
|
if (try_to_freeze())
|
|
|
|
continue;
|
2018-01-25 18:57:27 +08:00
|
|
|
if (f2fs_readonly(sbi->sb))
|
|
|
|
continue;
|
2017-05-18 01:36:58 +08:00
|
|
|
if (kthread_should_stop())
|
|
|
|
return 0;
|
2017-01-10 12:32:07 +08:00
|
|
|
|
2018-02-23 15:30:55 +08:00
|
|
|
if (dcc->discard_wake)
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
dcc->discard_wake = 0;
|
2018-02-23 15:30:55 +08:00
|
|
|
|
|
|
|
if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
|
|
|
|
init_discard_policy(&dpolicy, DPOLICY_FORCE, 1);
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
|
f2fs: make background threads of f2fs being aware of freezing
When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.
This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread
Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
generic/241 reports below bug:
======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc1+ #32 Tainted: G O
------------------------------------------------------
f2fs_gc-250:0/22186 is trying to acquire lock:
(&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
but task is already holding lock:
(sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){++++.-}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__sb_start_write+0x11d/0x1f0
f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
evict+0xa8/0x170
iput+0x1fb/0x2c0
f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
write_checkpoint+0x1b1/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
f2fs_sync_file+0x34/0x40 [f2fs]
vfs_fsync_range+0x4a/0xa0
do_fsync+0x3c/0x60
SyS_fdatasync+0x15/0x20
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #1 (&sbi->cp_mutex){+.+...}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
write_checkpoint+0x2f/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x92
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #0 (&sbi->gc_mutex){+.+...}:
validate_chain.isra.36+0xc50/0xdb0
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
kthread+0xe9/0x120
ret_from_fork+0x19/0x24
other info that might help us debug this:
Chain exists of:
&sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal#2);
lock(&sbi->cp_mutex);
lock(sb_internal#2);
lock(&sbi->gc_mutex);
*** DEADLOCK ***
1 lock held by f2fs_gc-250:0/22186:
#0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
stack backtrace:
CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_circular_bug+0x1b3/0x1bd
validate_chain.isra.36+0xc50/0xdb0
? __this_cpu_preempt_check+0xf/0x20
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
__mutex_lock+0x4f/0x830
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
mutex_lock_nested+0x25/0x30
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
? preempt_schedule_common+0x2f/0x4d
? f2fs_gc+0x540/0x540 [f2fs]
kthread+0xe9/0x120
? f2fs_gc+0x540/0x540 [f2fs]
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x19/0x24
The deadlock occurs in below condition:
GC Thread Thread B
- sb_start_intwrite
- f2fs_sync_file
- f2fs_sync_fs
- mutex_lock(&sbi->gc_mutex)
- write_checkpoint
- block_operations
- f2fs_sync_inode_meta
- iput
- sb_start_intwrite
- mutex_lock(&sbi->gc_mutex)
Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-22 08:52:23 +08:00
|
|
|
sb_start_intwrite(sbi->sb);
|
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
issued = __issue_discard_cmd(sbi, &dpolicy);
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
if (issued) {
|
2017-10-04 09:08:34 +08:00
|
|
|
__wait_all_discard_cmd(sbi, &dpolicy);
|
|
|
|
wait_ms = dpolicy.min_interval;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
} else {
|
2017-10-04 09:08:34 +08:00
|
|
|
wait_ms = dpolicy.max_interval;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
}
|
2017-05-18 01:36:58 +08:00
|
|
|
|
f2fs: make background threads of f2fs being aware of freezing
When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.
This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread
Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
generic/241 reports below bug:
======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc1+ #32 Tainted: G O
------------------------------------------------------
f2fs_gc-250:0/22186 is trying to acquire lock:
(&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
but task is already holding lock:
(sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){++++.-}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__sb_start_write+0x11d/0x1f0
f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
evict+0xa8/0x170
iput+0x1fb/0x2c0
f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
write_checkpoint+0x1b1/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
f2fs_sync_file+0x34/0x40 [f2fs]
vfs_fsync_range+0x4a/0xa0
do_fsync+0x3c/0x60
SyS_fdatasync+0x15/0x20
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #1 (&sbi->cp_mutex){+.+...}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
write_checkpoint+0x2f/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x92
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #0 (&sbi->gc_mutex){+.+...}:
validate_chain.isra.36+0xc50/0xdb0
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
kthread+0xe9/0x120
ret_from_fork+0x19/0x24
other info that might help us debug this:
Chain exists of:
&sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal#2);
lock(&sbi->cp_mutex);
lock(sb_internal#2);
lock(&sbi->gc_mutex);
*** DEADLOCK ***
1 lock held by f2fs_gc-250:0/22186:
#0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
stack backtrace:
CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_circular_bug+0x1b3/0x1bd
validate_chain.isra.36+0xc50/0xdb0
? __this_cpu_preempt_check+0xf/0x20
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
__mutex_lock+0x4f/0x830
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
mutex_lock_nested+0x25/0x30
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
? preempt_schedule_common+0x2f/0x4d
? f2fs_gc+0x540/0x540 [f2fs]
kthread+0xe9/0x120
? f2fs_gc+0x540/0x540 [f2fs]
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x19/0x24
The deadlock occurs in below condition:
GC Thread Thread B
- sb_start_intwrite
- f2fs_sync_file
- f2fs_sync_fs
- mutex_lock(&sbi->gc_mutex)
- write_checkpoint
- block_operations
- f2fs_sync_inode_meta
- iput
- sb_start_intwrite
- mutex_lock(&sbi->gc_mutex)
Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-22 08:52:23 +08:00
|
|
|
sb_end_intwrite(sbi->sb);
|
2017-05-18 01:36:58 +08:00
|
|
|
|
|
|
|
} while (!kthread_should_stop());
|
|
|
|
return 0;
|
2017-01-10 12:32:07 +08:00
|
|
|
}
|
|
|
|
|
2016-10-28 16:45:06 +08:00
|
|
|
#ifdef CONFIG_BLK_DEV_ZONED
|
2016-10-07 10:02:05 +08:00
|
|
|
static int __f2fs_issue_discard_zone(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t blkstart, block_t blklen)
|
2016-10-28 16:45:06 +08:00
|
|
|
{
|
2017-02-23 12:18:35 +08:00
|
|
|
sector_t sector, nr_sects;
|
2017-03-08 09:49:53 +08:00
|
|
|
block_t lblkstart = blkstart;
|
2016-10-07 10:02:05 +08:00
|
|
|
int devi = 0;
|
|
|
|
|
|
|
|
if (sbi->s_ndevs) {
|
|
|
|
devi = f2fs_target_device_index(sbi, blkstart);
|
|
|
|
blkstart -= FDEV(devi).start_blk;
|
|
|
|
}
|
2016-10-28 16:45:06 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We need to know the type of the zone: for conventional zones,
|
|
|
|
* use regular discard if the drive supports it. For sequential
|
|
|
|
* zones, reset the zone write pointer.
|
|
|
|
*/
|
2016-10-07 10:02:05 +08:00
|
|
|
switch (get_blkz_type(sbi, bdev, blkstart)) {
|
2016-10-28 16:45:06 +08:00
|
|
|
|
|
|
|
case BLK_ZONE_TYPE_CONVENTIONAL:
|
|
|
|
if (!blk_queue_discard(bdev_get_queue(bdev)))
|
|
|
|
return 0;
|
2017-03-08 10:02:02 +08:00
|
|
|
return __queue_discard_cmd(sbi, bdev, lblkstart, blklen);
|
2016-10-28 16:45:06 +08:00
|
|
|
case BLK_ZONE_TYPE_SEQWRITE_REQ:
|
|
|
|
case BLK_ZONE_TYPE_SEQWRITE_PREF:
|
2017-02-23 12:18:35 +08:00
|
|
|
sector = SECTOR_FROM_BLOCK(blkstart);
|
|
|
|
nr_sects = SECTOR_FROM_BLOCK(blklen);
|
|
|
|
|
|
|
|
if (sector & (bdev_zone_sectors(bdev) - 1) ||
|
|
|
|
nr_sects != bdev_zone_sectors(bdev)) {
|
|
|
|
f2fs_msg(sbi->sb, KERN_INFO,
|
|
|
|
"(%d) %s: Unaligned discard attempted (block %x + %x)",
|
|
|
|
devi, sbi->s_ndevs ? FDEV(devi).path: "",
|
|
|
|
blkstart, blklen);
|
|
|
|
return -EIO;
|
|
|
|
}
|
2017-02-16 03:14:06 +08:00
|
|
|
trace_f2fs_issue_reset_zone(bdev, blkstart);
|
2016-10-28 16:45:06 +08:00
|
|
|
return blkdev_reset_zones(bdev, sector,
|
|
|
|
nr_sects, GFP_NOFS);
|
|
|
|
default:
|
|
|
|
/* Unknown zone type: broken device ? */
|
|
|
|
return -EIO;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2016-10-07 10:02:05 +08:00
|
|
|
static int __issue_discard_async(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t blkstart, block_t blklen)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_BLK_DEV_ZONED
|
2018-02-06 12:31:17 +08:00
|
|
|
if (f2fs_sb_has_blkzoned(sbi->sb) &&
|
2016-10-07 10:02:05 +08:00
|
|
|
bdev_zoned_model(bdev) != BLK_ZONED_NONE)
|
|
|
|
return __f2fs_issue_discard_zone(sbi, bdev, blkstart, blklen);
|
|
|
|
#endif
|
2017-03-08 10:02:02 +08:00
|
|
|
return __queue_discard_cmd(sbi, bdev, blkstart, blklen);
|
2016-10-07 10:02:05 +08:00
|
|
|
}
|
|
|
|
|
2014-04-15 12:57:55 +08:00
|
|
|
static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
|
2013-11-12 15:55:17 +08:00
|
|
|
block_t blkstart, block_t blklen)
|
|
|
|
{
|
2016-10-07 10:02:05 +08:00
|
|
|
sector_t start = blkstart, len = 0;
|
|
|
|
struct block_device *bdev;
|
2015-05-01 13:37:50 +08:00
|
|
|
struct seg_entry *se;
|
|
|
|
unsigned int offset;
|
|
|
|
block_t i;
|
2016-10-07 10:02:05 +08:00
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
bdev = f2fs_target_device(sbi, blkstart, NULL);
|
|
|
|
|
|
|
|
for (i = blkstart; i < blkstart + blklen; i++, len++) {
|
|
|
|
if (i != start) {
|
|
|
|
struct block_device *bdev2 =
|
|
|
|
f2fs_target_device(sbi, i, NULL);
|
|
|
|
|
|
|
|
if (bdev2 != bdev) {
|
|
|
|
err = __issue_discard_async(sbi, bdev,
|
|
|
|
start, len);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
bdev = bdev2;
|
|
|
|
start = i;
|
|
|
|
len = 0;
|
|
|
|
}
|
|
|
|
}
|
2015-05-01 13:37:50 +08:00
|
|
|
|
|
|
|
se = get_seg_entry(sbi, GET_SEGNO(sbi, i));
|
|
|
|
offset = GET_BLKOFF_FROM_SEG0(sbi, i);
|
|
|
|
|
|
|
|
if (!f2fs_test_and_set_bit(offset, se->discard_map))
|
|
|
|
sbi->discard_blks--;
|
|
|
|
}
|
2016-10-28 16:45:06 +08:00
|
|
|
|
2016-10-07 10:02:05 +08:00
|
|
|
if (len)
|
|
|
|
err = __issue_discard_async(sbi, bdev, start, len);
|
|
|
|
return err;
|
2014-04-15 12:57:55 +08:00
|
|
|
}
|
|
|
|
|
2016-12-30 14:06:15 +08:00
|
|
|
static bool add_discard_addrs(struct f2fs_sb_info *sbi, struct cp_control *cpc,
|
|
|
|
bool check_only)
|
2014-10-29 13:27:59 +08:00
|
|
|
{
|
2013-11-12 13:49:56 +08:00
|
|
|
int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
|
|
|
|
int max_blocks = sbi->blocks_per_seg;
|
2014-09-21 13:06:39 +08:00
|
|
|
struct seg_entry *se = get_seg_entry(sbi, cpc->trim_start);
|
2013-11-12 13:49:56 +08:00
|
|
|
unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
|
|
|
|
unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
|
2015-05-01 13:37:50 +08:00
|
|
|
unsigned long *discard_map = (unsigned long *)se->discard_map;
|
2015-02-11 08:44:29 +08:00
|
|
|
unsigned long *dmap = SIT_I(sbi)->tmp_map;
|
2013-11-12 13:49:56 +08:00
|
|
|
unsigned int start = 0, end = -1;
|
2017-04-27 20:40:39 +08:00
|
|
|
bool force = (cpc->reason & CP_DISCARD);
|
2017-03-28 18:18:50 +08:00
|
|
|
struct discard_entry *de = NULL;
|
2017-04-15 14:09:36 +08:00
|
|
|
struct list_head *head = &SM_I(sbi)->dcc_info->entry_list;
|
2013-11-12 13:49:56 +08:00
|
|
|
int i;
|
|
|
|
|
2016-08-03 01:56:40 +08:00
|
|
|
if (se->valid_blocks == max_blocks || !f2fs_discard_en(sbi))
|
2016-12-30 14:06:15 +08:00
|
|
|
return false;
|
2013-11-12 13:49:56 +08:00
|
|
|
|
2015-05-01 13:37:50 +08:00
|
|
|
if (!force) {
|
|
|
|
if (!test_opt(sbi, DISCARD) || !se->valid_blocks ||
|
2017-01-12 06:40:24 +08:00
|
|
|
SM_I(sbi)->dcc_info->nr_discards >=
|
|
|
|
SM_I(sbi)->dcc_info->max_discards)
|
2016-12-30 14:06:15 +08:00
|
|
|
return false;
|
2014-09-21 13:06:39 +08:00
|
|
|
}
|
|
|
|
|
2013-11-12 13:49:56 +08:00
|
|
|
/* SIT_VBLOCK_MAP_SIZE should be multiple of sizeof(unsigned long) */
|
|
|
|
for (i = 0; i < entries; i++)
|
2015-05-01 13:37:50 +08:00
|
|
|
dmap[i] = force ? ~ckpt_map[i] & ~discard_map[i] :
|
2014-12-13 05:53:41 +08:00
|
|
|
(cur_map[i] ^ ckpt_map[i]) & ckpt_map[i];
|
2013-11-12 13:49:56 +08:00
|
|
|
|
2017-01-12 06:40:24 +08:00
|
|
|
while (force || SM_I(sbi)->dcc_info->nr_discards <=
|
|
|
|
SM_I(sbi)->dcc_info->max_discards) {
|
2013-11-12 13:49:56 +08:00
|
|
|
start = __find_rev_next_bit(dmap, max_blocks, end + 1);
|
|
|
|
if (start >= max_blocks)
|
|
|
|
break;
|
|
|
|
|
|
|
|
end = __find_rev_next_zero_bit(dmap, max_blocks, start + 1);
|
2016-07-07 12:13:33 +08:00
|
|
|
if (force && start && end != max_blocks
|
|
|
|
&& (end - start) < cpc->trim_minlen)
|
|
|
|
continue;
|
|
|
|
|
2016-12-30 14:06:15 +08:00
|
|
|
if (check_only)
|
|
|
|
return true;
|
|
|
|
|
2017-03-28 18:18:50 +08:00
|
|
|
if (!de) {
|
|
|
|
de = f2fs_kmem_cache_alloc(discard_entry_slab,
|
|
|
|
GFP_F2FS_ZERO);
|
|
|
|
de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
|
|
|
|
list_add_tail(&de->list, head);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = start; i < end; i++)
|
|
|
|
__set_bit_le(i, (void *)de->discard_map);
|
|
|
|
|
|
|
|
SM_I(sbi)->dcc_info->nr_discards += end - start;
|
2013-11-12 13:49:56 +08:00
|
|
|
}
|
2016-12-30 14:06:15 +08:00
|
|
|
return false;
|
2013-11-12 13:49:56 +08:00
|
|
|
}
|
|
|
|
|
2014-09-21 13:06:39 +08:00
|
|
|
void release_discard_addrs(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2017-04-15 14:09:36 +08:00
|
|
|
struct list_head *head = &(SM_I(sbi)->dcc_info->entry_list);
|
2014-09-21 13:06:39 +08:00
|
|
|
struct discard_entry *entry, *this;
|
|
|
|
|
|
|
|
/* drop caches */
|
|
|
|
list_for_each_entry_safe(entry, this, head, list) {
|
|
|
|
list_del(&entry->list);
|
|
|
|
kmem_cache_free(discard_entry_slab, entry);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Should call clear_prefree_segments after checkpoint is done.
|
|
|
|
*/
|
|
|
|
static void set_prefree_as_free_segments(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2014-08-04 10:10:07 +08:00
|
|
|
unsigned int segno;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
2014-09-24 02:23:01 +08:00
|
|
|
for_each_set_bit(segno, dirty_i->dirty_segmap[PRE], MAIN_SEGS(sbi))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
__set_test_and_free(sbi, segno);
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
}
|
|
|
|
|
2015-05-01 13:50:06 +08:00
|
|
|
void clear_prefree_segments(struct f2fs_sb_info *sbi, struct cp_control *cpc)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct list_head *head = &dcc->entry_list;
|
2014-03-29 11:33:17 +08:00
|
|
|
struct discard_entry *entry, *this;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2013-11-11 08:24:37 +08:00
|
|
|
unsigned long *prefree_map = dirty_i->dirty_segmap[PRE];
|
|
|
|
unsigned int start = 0, end = -1;
|
2016-06-04 10:29:38 +08:00
|
|
|
unsigned int secno, start_segno;
|
2017-04-27 20:40:39 +08:00
|
|
|
bool force = (cpc->reason & CP_DISCARD);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
2013-11-11 08:24:37 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
while (1) {
|
2013-11-11 08:24:37 +08:00
|
|
|
int i;
|
2014-09-24 02:23:01 +08:00
|
|
|
start = find_next_bit(prefree_map, MAIN_SEGS(sbi), end + 1);
|
|
|
|
if (start >= MAIN_SEGS(sbi))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
break;
|
2014-09-24 02:23:01 +08:00
|
|
|
end = find_next_zero_bit(prefree_map, MAIN_SEGS(sbi),
|
|
|
|
start + 1);
|
2013-11-11 08:24:37 +08:00
|
|
|
|
|
|
|
for (i = start; i < end; i++)
|
|
|
|
clear_bit(i, prefree_map);
|
|
|
|
|
|
|
|
dirty_i->nr_dirty[PRE] -= end - start;
|
|
|
|
|
2016-12-22 11:46:24 +08:00
|
|
|
if (!test_opt(sbi, DISCARD))
|
2013-11-11 08:24:37 +08:00
|
|
|
continue;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2016-12-22 11:46:24 +08:00
|
|
|
if (force && start >= cpc->trim_start &&
|
|
|
|
(end - 1) <= cpc->trim_end)
|
|
|
|
continue;
|
|
|
|
|
2016-06-04 10:29:38 +08:00
|
|
|
if (!test_opt(sbi, LFS) || sbi->segs_per_sec == 1) {
|
|
|
|
f2fs_issue_discard(sbi, START_BLOCK(sbi, start),
|
2013-11-12 15:55:17 +08:00
|
|
|
(end - start) << sbi->log_blocks_per_seg);
|
2016-06-04 10:29:38 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
next:
|
2017-04-08 06:08:17 +08:00
|
|
|
secno = GET_SEC_FROM_SEG(sbi, start);
|
|
|
|
start_segno = GET_SEG_FROM_SEC(sbi, secno);
|
2016-06-04 10:29:38 +08:00
|
|
|
if (!IS_CURSEC(sbi, secno) &&
|
2017-04-08 05:33:22 +08:00
|
|
|
!get_valid_blocks(sbi, start, true))
|
2016-06-04 10:29:38 +08:00
|
|
|
f2fs_issue_discard(sbi, START_BLOCK(sbi, start_segno),
|
|
|
|
sbi->segs_per_sec << sbi->log_blocks_per_seg);
|
|
|
|
|
|
|
|
start = start_segno + sbi->segs_per_sec;
|
|
|
|
if (start < end)
|
|
|
|
goto next;
|
2017-02-28 03:57:11 +08:00
|
|
|
else
|
|
|
|
end = start - 1;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
2013-11-12 13:49:56 +08:00
|
|
|
|
|
|
|
/* send small discards */
|
2014-03-29 11:33:17 +08:00
|
|
|
list_for_each_entry_safe(entry, this, head, list) {
|
2017-03-28 18:18:50 +08:00
|
|
|
unsigned int cur_pos = 0, next_pos, len, total_len = 0;
|
|
|
|
bool is_valid = test_bit_le(0, entry->discard_map);
|
|
|
|
|
|
|
|
find_next:
|
|
|
|
if (is_valid) {
|
|
|
|
next_pos = find_next_zero_bit_le(entry->discard_map,
|
|
|
|
sbi->blocks_per_seg, cur_pos);
|
|
|
|
len = next_pos - cur_pos;
|
|
|
|
|
2018-02-06 12:31:17 +08:00
|
|
|
if (f2fs_sb_has_blkzoned(sbi->sb) ||
|
2017-05-26 16:04:40 +08:00
|
|
|
(force && len < cpc->trim_minlen))
|
2017-03-28 18:18:50 +08:00
|
|
|
goto skip;
|
|
|
|
|
|
|
|
f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
|
|
|
|
len);
|
|
|
|
total_len += len;
|
|
|
|
} else {
|
|
|
|
next_pos = find_next_bit_le(entry->discard_map,
|
|
|
|
sbi->blocks_per_seg, cur_pos);
|
|
|
|
}
|
2015-05-01 13:50:06 +08:00
|
|
|
skip:
|
2017-03-28 18:18:50 +08:00
|
|
|
cur_pos = next_pos;
|
|
|
|
is_valid = !is_valid;
|
|
|
|
|
|
|
|
if (cur_pos < sbi->blocks_per_seg)
|
|
|
|
goto find_next;
|
|
|
|
|
2013-11-12 13:49:56 +08:00
|
|
|
list_del(&entry->list);
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
dcc->nr_discards -= total_len;
|
2013-11-12 13:49:56 +08:00
|
|
|
kmem_cache_free(discard_entry_slab, entry);
|
|
|
|
}
|
2017-04-25 00:21:34 +08:00
|
|
|
|
2017-08-23 12:15:43 +08:00
|
|
|
wake_up_discard_thread(sbi, false);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
void init_discard_policy(struct discard_policy *dpolicy,
|
|
|
|
int discard_type, unsigned int granularity)
|
2017-10-04 09:08:33 +08:00
|
|
|
{
|
2017-10-04 09:08:34 +08:00
|
|
|
/* common policy */
|
|
|
|
dpolicy->type = discard_type;
|
2017-10-04 09:08:33 +08:00
|
|
|
dpolicy->sync = true;
|
2017-10-04 09:08:34 +08:00
|
|
|
dpolicy->granularity = granularity;
|
|
|
|
|
2018-01-25 18:57:26 +08:00
|
|
|
dpolicy->max_requests = DEF_MAX_DISCARD_REQUEST;
|
|
|
|
dpolicy->io_aware_gran = MAX_PLIST_NUM;
|
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
if (discard_type == DPOLICY_BG) {
|
|
|
|
dpolicy->min_interval = DEF_MIN_DISCARD_ISSUE_TIME;
|
|
|
|
dpolicy->max_interval = DEF_MAX_DISCARD_ISSUE_TIME;
|
|
|
|
dpolicy->io_aware = true;
|
|
|
|
} else if (discard_type == DPOLICY_FORCE) {
|
|
|
|
dpolicy->min_interval = DEF_MIN_DISCARD_ISSUE_TIME;
|
|
|
|
dpolicy->max_interval = DEF_MAX_DISCARD_ISSUE_TIME;
|
2018-02-23 15:30:55 +08:00
|
|
|
dpolicy->io_aware = false;
|
2017-10-04 09:08:34 +08:00
|
|
|
} else if (discard_type == DPOLICY_FSTRIM) {
|
|
|
|
dpolicy->io_aware = false;
|
|
|
|
} else if (discard_type == DPOLICY_UMOUNT) {
|
|
|
|
dpolicy->io_aware = false;
|
|
|
|
}
|
2017-10-04 09:08:33 +08:00
|
|
|
}
|
|
|
|
|
2017-01-29 13:27:02 +08:00
|
|
|
static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
|
2017-01-12 06:40:24 +08:00
|
|
|
{
|
2017-01-10 12:32:07 +08:00
|
|
|
dev_t dev = sbi->sb->s_bdev->bd_dev;
|
2017-01-12 06:40:24 +08:00
|
|
|
struct discard_cmd_control *dcc;
|
2017-04-15 14:09:37 +08:00
|
|
|
int err = 0, i;
|
2017-01-12 06:40:24 +08:00
|
|
|
|
|
|
|
if (SM_I(sbi)->dcc_info) {
|
|
|
|
dcc = SM_I(sbi)->dcc_info;
|
|
|
|
goto init_thread;
|
|
|
|
}
|
|
|
|
|
2017-11-30 19:28:17 +08:00
|
|
|
dcc = f2fs_kzalloc(sbi, sizeof(struct discard_cmd_control), GFP_KERNEL);
|
2017-01-12 06:40:24 +08:00
|
|
|
if (!dcc)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
dcc->discard_granularity = DEFAULT_DISCARD_GRANULARITY;
|
2017-04-15 14:09:36 +08:00
|
|
|
INIT_LIST_HEAD(&dcc->entry_list);
|
2017-10-04 09:08:34 +08:00
|
|
|
for (i = 0; i < MAX_PLIST_NUM; i++)
|
2017-04-15 14:09:37 +08:00
|
|
|
INIT_LIST_HEAD(&dcc->pend_list[i]);
|
2017-04-15 14:09:36 +08:00
|
|
|
INIT_LIST_HEAD(&dcc->wait_list);
|
2017-10-04 09:08:32 +08:00
|
|
|
INIT_LIST_HEAD(&dcc->fstrim_list);
|
2017-01-10 12:32:07 +08:00
|
|
|
mutex_init(&dcc->cmd_lock);
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_set(&dcc->issued_discard, 0);
|
|
|
|
atomic_set(&dcc->issing_discard, 0);
|
2017-03-25 17:19:59 +08:00
|
|
|
atomic_set(&dcc->discard_cmd_cnt, 0);
|
2017-01-12 06:40:24 +08:00
|
|
|
dcc->nr_discards = 0;
|
2017-04-25 00:21:35 +08:00
|
|
|
dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks = 0;
|
2017-04-14 23:24:55 +08:00
|
|
|
dcc->root = RB_ROOT;
|
2017-01-12 06:40:24 +08:00
|
|
|
|
2017-01-10 12:32:07 +08:00
|
|
|
init_waitqueue_head(&dcc->discard_wait_queue);
|
2017-01-12 06:40:24 +08:00
|
|
|
SM_I(sbi)->dcc_info = dcc;
|
|
|
|
init_thread:
|
2017-01-10 12:32:07 +08:00
|
|
|
dcc->f2fs_issue_discard = kthread_run(issue_discard_thread, sbi,
|
|
|
|
"f2fs_discard-%u:%u", MAJOR(dev), MINOR(dev));
|
|
|
|
if (IS_ERR(dcc->f2fs_issue_discard)) {
|
|
|
|
err = PTR_ERR(dcc->f2fs_issue_discard);
|
|
|
|
kfree(dcc);
|
|
|
|
SM_I(sbi)->dcc_info = NULL;
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-01-12 06:40:24 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-03-27 18:14:04 +08:00
|
|
|
static void destroy_discard_cmd_control(struct f2fs_sb_info *sbi)
|
2017-01-12 06:40:24 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
|
2017-03-27 18:14:04 +08:00
|
|
|
if (!dcc)
|
|
|
|
return;
|
|
|
|
|
2017-06-29 23:17:45 +08:00
|
|
|
stop_discard_thread(sbi);
|
2017-03-27 18:14:04 +08:00
|
|
|
|
|
|
|
kfree(dcc);
|
|
|
|
SM_I(sbi)->dcc_info = NULL;
|
2017-01-12 06:40:24 +08:00
|
|
|
}
|
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
static bool __mark_sit_entry_dirty(struct f2fs_sb_info *sbi, unsigned int segno)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
|
|
|
if (!__test_and_set_bit(segno, sit_i->dirty_sentries_bitmap)) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
sit_i->dirty_sentries++;
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void __set_sit_entry_type(struct f2fs_sb_info *sbi, int type,
|
|
|
|
unsigned int segno, int modified)
|
|
|
|
{
|
|
|
|
struct seg_entry *se = get_seg_entry(sbi, segno);
|
|
|
|
se->type = type;
|
|
|
|
if (modified)
|
|
|
|
__mark_sit_entry_dirty(sbi, segno);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
|
|
|
|
{
|
|
|
|
struct seg_entry *se;
|
|
|
|
unsigned int segno, offset;
|
|
|
|
long int new_vblocks;
|
2017-08-02 21:20:13 +08:00
|
|
|
bool exist;
|
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
bool mir_exist;
|
|
|
|
#endif
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
segno = GET_SEGNO(sbi, blkaddr);
|
|
|
|
|
|
|
|
se = get_seg_entry(sbi, segno);
|
|
|
|
new_vblocks = se->valid_blocks + del;
|
2014-02-04 12:01:10 +08:00
|
|
|
offset = GET_BLKOFF_FROM_SEG0(sbi, blkaddr);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2014-09-03 06:52:58 +08:00
|
|
|
f2fs_bug_on(sbi, (new_vblocks >> (sizeof(unsigned short) << 3) ||
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
(new_vblocks > sbi->blocks_per_seg)));
|
|
|
|
|
|
|
|
se->valid_blocks = new_vblocks;
|
|
|
|
se->mtime = get_mtime(sbi);
|
|
|
|
SIT_I(sbi)->max_mtime = se->mtime;
|
|
|
|
|
|
|
|
/* Update valid block bitmap */
|
|
|
|
if (del > 0) {
|
2017-08-02 21:20:13 +08:00
|
|
|
exist = f2fs_test_and_set_bit(offset, se->cur_valid_map);
|
2017-01-07 18:51:01 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
2017-08-02 21:20:13 +08:00
|
|
|
mir_exist = f2fs_test_and_set_bit(offset,
|
|
|
|
se->cur_valid_map_mir);
|
|
|
|
if (unlikely(exist != mir_exist)) {
|
|
|
|
f2fs_msg(sbi->sb, KERN_ERR, "Inconsistent error "
|
|
|
|
"when setting bitmap, blk:%u, old bit:%d",
|
|
|
|
blkaddr, exist);
|
2014-09-03 07:05:00 +08:00
|
|
|
f2fs_bug_on(sbi, 1);
|
2017-08-02 21:20:13 +08:00
|
|
|
}
|
2017-01-07 18:51:01 +08:00
|
|
|
#endif
|
2017-08-02 21:20:13 +08:00
|
|
|
if (unlikely(exist)) {
|
|
|
|
f2fs_msg(sbi->sb, KERN_ERR,
|
|
|
|
"Bitmap was wrongly set, blk:%u", blkaddr);
|
|
|
|
f2fs_bug_on(sbi, 1);
|
2017-08-02 22:16:54 +08:00
|
|
|
se->valid_blocks--;
|
|
|
|
del = 0;
|
2017-01-07 18:51:01 +08:00
|
|
|
}
|
2017-08-02 21:20:13 +08:00
|
|
|
|
2016-08-03 01:56:40 +08:00
|
|
|
if (f2fs_discard_en(sbi) &&
|
|
|
|
!f2fs_test_and_set_bit(offset, se->discard_map))
|
2015-05-01 13:37:50 +08:00
|
|
|
sbi->discard_blks--;
|
2017-03-07 03:59:56 +08:00
|
|
|
|
|
|
|
/* don't overwrite by SSR to keep node chain */
|
|
|
|
if (se->type == CURSEG_WARM_NODE) {
|
|
|
|
if (!f2fs_test_and_set_bit(offset, se->ckpt_valid_map))
|
|
|
|
se->ckpt_valid_blocks++;
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
} else {
|
2017-08-02 21:20:13 +08:00
|
|
|
exist = f2fs_test_and_clear_bit(offset, se->cur_valid_map);
|
2017-01-07 18:51:01 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
2017-08-02 21:20:13 +08:00
|
|
|
mir_exist = f2fs_test_and_clear_bit(offset,
|
|
|
|
se->cur_valid_map_mir);
|
|
|
|
if (unlikely(exist != mir_exist)) {
|
|
|
|
f2fs_msg(sbi->sb, KERN_ERR, "Inconsistent error "
|
|
|
|
"when clearing bitmap, blk:%u, old bit:%d",
|
|
|
|
blkaddr, exist);
|
2014-09-03 07:05:00 +08:00
|
|
|
f2fs_bug_on(sbi, 1);
|
2017-08-02 21:20:13 +08:00
|
|
|
}
|
2017-01-07 18:51:01 +08:00
|
|
|
#endif
|
2017-08-02 21:20:13 +08:00
|
|
|
if (unlikely(!exist)) {
|
|
|
|
f2fs_msg(sbi->sb, KERN_ERR,
|
|
|
|
"Bitmap was wrongly cleared, blk:%u", blkaddr);
|
|
|
|
f2fs_bug_on(sbi, 1);
|
2017-08-02 22:16:54 +08:00
|
|
|
se->valid_blocks++;
|
|
|
|
del = 0;
|
2017-01-07 18:51:01 +08:00
|
|
|
}
|
2017-08-02 21:20:13 +08:00
|
|
|
|
2016-08-03 01:56:40 +08:00
|
|
|
if (f2fs_discard_en(sbi) &&
|
|
|
|
f2fs_test_and_clear_bit(offset, se->discard_map))
|
2015-05-01 13:37:50 +08:00
|
|
|
sbi->discard_blks++;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
if (!f2fs_test_bit(offset, se->ckpt_valid_map))
|
|
|
|
se->ckpt_valid_blocks += del;
|
|
|
|
|
|
|
|
__mark_sit_entry_dirty(sbi, segno);
|
|
|
|
|
|
|
|
/* update total number of valid blocks to be written in ckpt area */
|
|
|
|
SIT_I(sbi)->written_valid_blocks += del;
|
|
|
|
|
|
|
|
if (sbi->segs_per_sec > 1)
|
|
|
|
get_sec_entry(sbi, segno)->valid_blocks += del;
|
|
|
|
}
|
|
|
|
|
|
|
|
void invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr)
|
|
|
|
{
|
|
|
|
unsigned int segno = GET_SEGNO(sbi, addr);
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
|
2014-09-03 06:52:58 +08:00
|
|
|
f2fs_bug_on(sbi, addr == NULL_ADDR);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (addr == NEW_ADDR)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* add it into sit main buffer */
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
update_sit_entry(sbi, addr, -1);
|
|
|
|
|
|
|
|
/* add it into dirty seglist */
|
|
|
|
locate_dirty_segment(sbi, segno);
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2015-10-08 03:28:41 +08:00
|
|
|
bool is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr)
|
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
unsigned int segno, offset;
|
|
|
|
struct seg_entry *se;
|
|
|
|
bool is_cp = false;
|
|
|
|
|
|
|
|
if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR)
|
|
|
|
return true;
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_read(&sit_i->sentry_lock);
|
2015-10-08 03:28:41 +08:00
|
|
|
|
|
|
|
segno = GET_SEGNO(sbi, blkaddr);
|
|
|
|
se = get_seg_entry(sbi, segno);
|
|
|
|
offset = GET_BLKOFF_FROM_SEG0(sbi, blkaddr);
|
|
|
|
|
|
|
|
if (f2fs_test_bit(offset, se->ckpt_valid_map))
|
|
|
|
is_cp = true;
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
up_read(&sit_i->sentry_lock);
|
2015-10-08 03:28:41 +08:00
|
|
|
|
|
|
|
return is_cp;
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* This function should be resided under the curseg_mutex lock
|
|
|
|
*/
|
|
|
|
static void __add_sum_entry(struct f2fs_sb_info *sbi, int type,
|
2013-06-13 16:59:27 +08:00
|
|
|
struct f2fs_summary *sum)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
void *addr = curseg->sum_blk;
|
2013-06-13 16:59:27 +08:00
|
|
|
addr += curseg->next_blkoff * sizeof(struct f2fs_summary);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
memcpy(addr, sum, sizeof(struct f2fs_summary));
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Calculate the number of current summary pages for writing
|
|
|
|
*/
|
2014-12-09 14:21:46 +08:00
|
|
|
int npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
int valid_sum_count = 0;
|
2013-10-29 16:21:47 +08:00
|
|
|
int i, sum_in_page;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
|
|
|
|
if (sbi->ckpt->alloc_type[i] == SSR)
|
|
|
|
valid_sum_count += sbi->blocks_per_seg;
|
2014-12-09 14:21:46 +08:00
|
|
|
else {
|
|
|
|
if (for_ra)
|
|
|
|
valid_sum_count += le16_to_cpu(
|
|
|
|
F2FS_CKPT(sbi)->cur_data_blkoff[i]);
|
|
|
|
else
|
|
|
|
valid_sum_count += curseg_blkoff(sbi, i);
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
sum_in_page = (PAGE_SIZE - 2 * SUM_JOURNAL_SIZE -
|
2013-10-29 16:21:47 +08:00
|
|
|
SUM_FOOTER_SIZE) / SUMMARY_SIZE;
|
|
|
|
if (valid_sum_count <= sum_in_page)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 1;
|
2013-10-29 16:21:47 +08:00
|
|
|
else if ((valid_sum_count - sum_in_page) <=
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
(PAGE_SIZE - SUM_FOOTER_SIZE) / SUMMARY_SIZE)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 2;
|
|
|
|
return 3;
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Caller should put this summary page
|
|
|
|
*/
|
|
|
|
struct page *get_sum_page(struct f2fs_sb_info *sbi, unsigned int segno)
|
|
|
|
{
|
|
|
|
return get_meta_page(sbi, GET_SUM_BLOCK(sbi, segno));
|
|
|
|
}
|
|
|
|
|
2015-05-19 17:40:04 +08:00
|
|
|
void update_meta_page(struct f2fs_sb_info *sbi, void *src, block_t blk_addr)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct page *page = grab_meta_page(sbi, blk_addr);
|
2015-05-19 17:40:04 +08:00
|
|
|
|
2017-11-02 20:41:02 +08:00
|
|
|
memcpy(page_address(page), src, PAGE_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
set_page_dirty(page);
|
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
}
|
|
|
|
|
2015-05-19 17:40:04 +08:00
|
|
|
static void write_sum_page(struct f2fs_sb_info *sbi,
|
|
|
|
struct f2fs_summary_block *sum_blk, block_t blk_addr)
|
|
|
|
{
|
|
|
|
update_meta_page(sbi, (void *)sum_blk, blk_addr);
|
|
|
|
}
|
|
|
|
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
static void write_current_sum_page(struct f2fs_sb_info *sbi,
|
|
|
|
int type, block_t blk_addr)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
struct page *page = grab_meta_page(sbi, blk_addr);
|
|
|
|
struct f2fs_summary_block *src = curseg->sum_blk;
|
|
|
|
struct f2fs_summary_block *dst;
|
|
|
|
|
|
|
|
dst = (struct f2fs_summary_block *)page_address(page);
|
|
|
|
|
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
|
|
|
|
|
|
|
down_read(&curseg->journal_rwsem);
|
|
|
|
memcpy(&dst->journal, curseg->journal, SUM_JOURNAL_SIZE);
|
|
|
|
up_read(&curseg->journal_rwsem);
|
|
|
|
|
|
|
|
memcpy(dst->entries, src->entries, SUM_ENTRY_SIZE);
|
|
|
|
memcpy(&dst->footer, &src->footer, SUM_FOOTER_SIZE);
|
|
|
|
|
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
|
|
|
|
|
|
|
set_page_dirty(page);
|
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
}
|
|
|
|
|
2017-04-21 04:51:57 +08:00
|
|
|
static int is_next_segment_free(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
unsigned int segno = curseg->segno + 1;
|
|
|
|
struct free_segmap_info *free_i = FREE_I(sbi);
|
|
|
|
|
|
|
|
if (segno < MAIN_SEGS(sbi) && segno % sbi->segs_per_sec)
|
|
|
|
return !test_bit(segno, free_i->free_segmap);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Find a new segment from the free segments bitmap to right order
|
|
|
|
* This function should be returned with success, otherwise BUG
|
|
|
|
*/
|
|
|
|
static void get_new_segment(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int *newseg, bool new_sec, int dir)
|
|
|
|
{
|
|
|
|
struct free_segmap_info *free_i = FREE_I(sbi);
|
|
|
|
unsigned int segno, secno, zoneno;
|
2014-09-24 02:23:01 +08:00
|
|
|
unsigned int total_zones = MAIN_SECS(sbi) / sbi->secs_per_zone;
|
2017-04-08 06:08:17 +08:00
|
|
|
unsigned int hint = GET_SEC_FROM_SEG(sbi, *newseg);
|
|
|
|
unsigned int old_zoneno = GET_ZONE_FROM_SEG(sbi, *newseg);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
unsigned int left_start = hint;
|
|
|
|
bool init = true;
|
|
|
|
int go_left = 0;
|
|
|
|
int i;
|
|
|
|
|
2015-02-11 18:20:38 +08:00
|
|
|
spin_lock(&free_i->segmap_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
if (!new_sec && ((*newseg + 1) % sbi->segs_per_sec)) {
|
|
|
|
segno = find_next_zero_bit(free_i->free_segmap,
|
2017-04-08 06:08:17 +08:00
|
|
|
GET_SEG_FROM_SEC(sbi, hint + 1), *newseg + 1);
|
|
|
|
if (segno < GET_SEG_FROM_SEC(sbi, hint + 1))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
goto got_it;
|
|
|
|
}
|
|
|
|
find_other_zone:
|
2014-09-24 02:23:01 +08:00
|
|
|
secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint);
|
|
|
|
if (secno >= MAIN_SECS(sbi)) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (dir == ALLOC_RIGHT) {
|
|
|
|
secno = find_next_zero_bit(free_i->free_secmap,
|
2014-09-24 02:23:01 +08:00
|
|
|
MAIN_SECS(sbi), 0);
|
|
|
|
f2fs_bug_on(sbi, secno >= MAIN_SECS(sbi));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
} else {
|
|
|
|
go_left = 1;
|
|
|
|
left_start = hint - 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (go_left == 0)
|
|
|
|
goto skip_left;
|
|
|
|
|
|
|
|
while (test_bit(left_start, free_i->free_secmap)) {
|
|
|
|
if (left_start > 0) {
|
|
|
|
left_start--;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
left_start = find_next_zero_bit(free_i->free_secmap,
|
2014-09-24 02:23:01 +08:00
|
|
|
MAIN_SECS(sbi), 0);
|
|
|
|
f2fs_bug_on(sbi, left_start >= MAIN_SECS(sbi));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
secno = left_start;
|
|
|
|
skip_left:
|
2017-04-08 06:08:17 +08:00
|
|
|
segno = GET_SEG_FROM_SEC(sbi, secno);
|
|
|
|
zoneno = GET_ZONE_FROM_SEC(sbi, secno);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
/* give up on finding another zone */
|
|
|
|
if (!init)
|
|
|
|
goto got_it;
|
|
|
|
if (sbi->secs_per_zone == 1)
|
|
|
|
goto got_it;
|
|
|
|
if (zoneno == old_zoneno)
|
|
|
|
goto got_it;
|
|
|
|
if (dir == ALLOC_LEFT) {
|
|
|
|
if (!go_left && zoneno + 1 >= total_zones)
|
|
|
|
goto got_it;
|
|
|
|
if (go_left && zoneno == 0)
|
|
|
|
goto got_it;
|
|
|
|
}
|
|
|
|
for (i = 0; i < NR_CURSEG_TYPE; i++)
|
|
|
|
if (CURSEG_I(sbi, i)->zone == zoneno)
|
|
|
|
break;
|
|
|
|
|
|
|
|
if (i < NR_CURSEG_TYPE) {
|
|
|
|
/* zone is in user, try another */
|
|
|
|
if (go_left)
|
|
|
|
hint = zoneno * sbi->secs_per_zone - 1;
|
|
|
|
else if (zoneno + 1 >= total_zones)
|
|
|
|
hint = 0;
|
|
|
|
else
|
|
|
|
hint = (zoneno + 1) * sbi->secs_per_zone;
|
|
|
|
init = false;
|
|
|
|
goto find_other_zone;
|
|
|
|
}
|
|
|
|
got_it:
|
|
|
|
/* set it as dirty segment in free segmap */
|
2014-09-03 06:52:58 +08:00
|
|
|
f2fs_bug_on(sbi, test_bit(segno, free_i->free_segmap));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
__set_inuse(sbi, segno);
|
|
|
|
*newseg = segno;
|
2015-02-11 18:20:38 +08:00
|
|
|
spin_unlock(&free_i->segmap_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void reset_curseg(struct f2fs_sb_info *sbi, int type, int modified)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
struct summary_footer *sum_footer;
|
|
|
|
|
|
|
|
curseg->segno = curseg->next_segno;
|
2017-04-08 06:08:17 +08:00
|
|
|
curseg->zone = GET_ZONE_FROM_SEG(sbi, curseg->segno);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
curseg->next_blkoff = 0;
|
|
|
|
curseg->next_segno = NULL_SEGNO;
|
|
|
|
|
|
|
|
sum_footer = &(curseg->sum_blk->footer);
|
|
|
|
memset(sum_footer, 0, sizeof(struct summary_footer));
|
|
|
|
if (IS_DATASEG(type))
|
|
|
|
SET_SUM_TYPE(sum_footer, SUM_TYPE_DATA);
|
|
|
|
if (IS_NODESEG(type))
|
|
|
|
SET_SUM_TYPE(sum_footer, SUM_TYPE_NODE);
|
|
|
|
__set_sit_entry_type(sbi, type, curseg->segno, modified);
|
|
|
|
}
|
|
|
|
|
2017-03-25 08:41:45 +08:00
|
|
|
static unsigned int __get_next_segno(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
2017-04-21 04:51:57 +08:00
|
|
|
/* if segs_per_sec is large than 1, we need to keep original policy. */
|
|
|
|
if (sbi->segs_per_sec != 1)
|
|
|
|
return CURSEG_I(sbi, type)->segno;
|
|
|
|
|
2018-01-29 11:37:45 +08:00
|
|
|
if (test_opt(sbi, NOHEAP) &&
|
|
|
|
(type == CURSEG_HOT_DATA || IS_NODESEG(type)))
|
2017-03-25 08:41:45 +08:00
|
|
|
return 0;
|
|
|
|
|
2017-04-14 06:17:00 +08:00
|
|
|
if (SIT_I(sbi)->last_victim[ALLOC_NEXT])
|
|
|
|
return SIT_I(sbi)->last_victim[ALLOC_NEXT];
|
2018-02-19 00:50:49 +08:00
|
|
|
|
|
|
|
/* find segments from 0 to reuse freed segments */
|
|
|
|
if (sbi->alloc_mode == ALLOC_MODE_REUSE)
|
|
|
|
return 0;
|
|
|
|
|
2017-03-25 08:41:45 +08:00
|
|
|
return CURSEG_I(sbi, type)->segno;
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Allocate a current working segment.
|
|
|
|
* This function always allocates a free segment in LFS manner.
|
|
|
|
*/
|
|
|
|
static void new_curseg(struct f2fs_sb_info *sbi, int type, bool new_sec)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
unsigned int segno = curseg->segno;
|
|
|
|
int dir = ALLOC_LEFT;
|
|
|
|
|
|
|
|
write_sum_page(sbi, curseg->sum_blk,
|
2013-05-14 18:20:28 +08:00
|
|
|
GET_SUM_BLOCK(sbi, segno));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (type == CURSEG_WARM_DATA || type == CURSEG_COLD_DATA)
|
|
|
|
dir = ALLOC_RIGHT;
|
|
|
|
|
|
|
|
if (test_opt(sbi, NOHEAP))
|
|
|
|
dir = ALLOC_RIGHT;
|
|
|
|
|
2017-03-25 08:41:45 +08:00
|
|
|
segno = __get_next_segno(sbi, type);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
get_new_segment(sbi, &segno, new_sec, dir);
|
|
|
|
curseg->next_segno = segno;
|
|
|
|
reset_curseg(sbi, type, 1);
|
|
|
|
curseg->alloc_type = LFS;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __next_free_blkoff(struct f2fs_sb_info *sbi,
|
|
|
|
struct curseg_info *seg, block_t start)
|
|
|
|
{
|
|
|
|
struct seg_entry *se = get_seg_entry(sbi, seg->segno);
|
2013-11-15 12:21:16 +08:00
|
|
|
int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
|
2015-02-11 08:44:29 +08:00
|
|
|
unsigned long *target_map = SIT_I(sbi)->tmp_map;
|
2013-11-15 12:21:16 +08:00
|
|
|
unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
|
|
|
|
unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
|
|
|
|
int i, pos;
|
|
|
|
|
|
|
|
for (i = 0; i < entries; i++)
|
|
|
|
target_map[i] = ckpt_map[i] | cur_map[i];
|
|
|
|
|
|
|
|
pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start);
|
|
|
|
|
|
|
|
seg->next_blkoff = pos;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* If a segment is written by LFS manner, next block offset is just obtained
|
|
|
|
* by increasing the current block offset. However, if a segment is written by
|
|
|
|
* SSR manner, next block offset obtained by calling __next_free_blkoff
|
|
|
|
*/
|
|
|
|
static void __refresh_next_blkoff(struct f2fs_sb_info *sbi,
|
|
|
|
struct curseg_info *seg)
|
|
|
|
{
|
|
|
|
if (seg->alloc_type == SSR)
|
|
|
|
__next_free_blkoff(sbi, seg, seg->next_blkoff + 1);
|
|
|
|
else
|
|
|
|
seg->next_blkoff++;
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
2014-08-06 22:22:50 +08:00
|
|
|
* This function always allocates a used segment(from dirty seglist) by SSR
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* manner, so it should recover the existing segment information of valid blocks
|
|
|
|
*/
|
2017-08-30 18:04:48 +08:00
|
|
|
static void change_curseg(struct f2fs_sb_info *sbi, int type)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
unsigned int new_segno = curseg->next_segno;
|
|
|
|
struct f2fs_summary_block *sum_node;
|
|
|
|
struct page *sum_page;
|
|
|
|
|
|
|
|
write_sum_page(sbi, curseg->sum_blk,
|
|
|
|
GET_SUM_BLOCK(sbi, curseg->segno));
|
|
|
|
__set_test_and_inuse(sbi, new_segno);
|
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
|
|
|
__remove_dirty_segment(sbi, new_segno, PRE);
|
|
|
|
__remove_dirty_segment(sbi, new_segno, DIRTY);
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
|
|
|
|
reset_curseg(sbi, type, 1);
|
|
|
|
curseg->alloc_type = SSR;
|
|
|
|
__next_free_blkoff(sbi, curseg, 0);
|
|
|
|
|
2017-08-30 18:04:48 +08:00
|
|
|
sum_page = get_sum_page(sbi, new_segno);
|
|
|
|
sum_node = (struct f2fs_summary_block *)page_address(sum_page);
|
|
|
|
memcpy(curseg->sum_blk, sum_node, SUM_ENTRY_SIZE);
|
|
|
|
f2fs_put_page(sum_page, 1);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2013-02-04 14:11:17 +08:00
|
|
|
static int get_ssr_segment(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
const struct victim_selection *v_ops = DIRTY_I(sbi)->v_ops;
|
2017-04-14 06:17:00 +08:00
|
|
|
unsigned segno = NULL_SEGNO;
|
2017-02-24 18:46:00 +08:00
|
|
|
int i, cnt;
|
|
|
|
bool reversed = false;
|
2017-02-23 09:10:18 +08:00
|
|
|
|
|
|
|
/* need_SSR() already forces to do this */
|
2017-04-14 06:17:00 +08:00
|
|
|
if (v_ops->get_victim(sbi, &segno, BG_GC, type, SSR)) {
|
|
|
|
curseg->next_segno = segno;
|
2017-02-23 09:10:18 +08:00
|
|
|
return 1;
|
2017-04-14 06:17:00 +08:00
|
|
|
}
|
2013-02-04 14:11:17 +08:00
|
|
|
|
2017-02-23 09:02:32 +08:00
|
|
|
/* For node segments, let's do SSR more intensively */
|
|
|
|
if (IS_NODESEG(type)) {
|
2017-02-24 18:46:00 +08:00
|
|
|
if (type >= CURSEG_WARM_NODE) {
|
|
|
|
reversed = true;
|
|
|
|
i = CURSEG_COLD_NODE;
|
|
|
|
} else {
|
|
|
|
i = CURSEG_HOT_NODE;
|
|
|
|
}
|
|
|
|
cnt = NR_CURSEG_NODE_TYPE;
|
2017-02-23 09:02:32 +08:00
|
|
|
} else {
|
2017-02-24 18:46:00 +08:00
|
|
|
if (type >= CURSEG_WARM_DATA) {
|
|
|
|
reversed = true;
|
|
|
|
i = CURSEG_COLD_DATA;
|
|
|
|
} else {
|
|
|
|
i = CURSEG_HOT_DATA;
|
|
|
|
}
|
|
|
|
cnt = NR_CURSEG_DATA_TYPE;
|
2017-02-23 09:02:32 +08:00
|
|
|
}
|
2013-02-04 14:11:17 +08:00
|
|
|
|
2017-02-24 18:46:00 +08:00
|
|
|
for (; cnt-- > 0; reversed ? i-- : i++) {
|
2017-02-23 09:10:18 +08:00
|
|
|
if (i == type)
|
|
|
|
continue;
|
2017-04-14 06:17:00 +08:00
|
|
|
if (v_ops->get_victim(sbi, &segno, BG_GC, i, SSR)) {
|
|
|
|
curseg->next_segno = segno;
|
2013-02-04 14:11:17 +08:00
|
|
|
return 1;
|
2017-04-14 06:17:00 +08:00
|
|
|
}
|
2017-02-23 09:10:18 +08:00
|
|
|
}
|
2013-02-04 14:11:17 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/*
|
|
|
|
* flush out current segment and replace it with new segment
|
|
|
|
* This function should be returned with success, otherwise BUG
|
|
|
|
*/
|
|
|
|
static void allocate_segment_by_default(struct f2fs_sb_info *sbi,
|
|
|
|
int type, bool force)
|
|
|
|
{
|
2017-04-21 04:51:57 +08:00
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
|
2013-08-19 09:41:15 +08:00
|
|
|
if (force)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
new_curseg(sbi, type, true);
|
2017-02-15 11:32:51 +08:00
|
|
|
else if (!is_set_ckpt_flags(sbi, CP_CRC_RECOVERY_FLAG) &&
|
|
|
|
type == CURSEG_WARM_NODE)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
new_curseg(sbi, type, false);
|
2017-04-21 04:51:57 +08:00
|
|
|
else if (curseg->alloc_type == LFS && is_next_segment_free(sbi, type))
|
|
|
|
new_curseg(sbi, type, false);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
else if (need_SSR(sbi) && get_ssr_segment(sbi, type))
|
2017-08-30 18:04:48 +08:00
|
|
|
change_curseg(sbi, type);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
else
|
|
|
|
new_curseg(sbi, type, false);
|
2013-10-22 19:56:10 +08:00
|
|
|
|
2017-04-21 04:51:57 +08:00
|
|
|
stat_inc_seg_type(sbi, curseg);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void allocate_new_segments(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2016-11-12 04:31:40 +08:00
|
|
|
struct curseg_info *curseg;
|
|
|
|
unsigned int old_segno;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
int i;
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&SIT_I(sbi)->sentry_lock);
|
|
|
|
|
2016-11-12 04:31:40 +08:00
|
|
|
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
|
|
|
|
curseg = CURSEG_I(sbi, i);
|
|
|
|
old_segno = curseg->segno;
|
|
|
|
SIT_I(sbi)->s_ops->allocate_segment(sbi, i, true);
|
|
|
|
locate_dirty_segment(sbi, old_segno);
|
|
|
|
}
|
2017-10-30 17:49:53 +08:00
|
|
|
|
|
|
|
up_write(&SIT_I(sbi)->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static const struct segment_allocation default_salloc_ops = {
|
|
|
|
.allocate_segment = allocate_segment_by_default,
|
|
|
|
};
|
|
|
|
|
2016-12-30 14:06:15 +08:00
|
|
|
bool exist_trim_candidates(struct f2fs_sb_info *sbi, struct cp_control *cpc)
|
|
|
|
{
|
|
|
|
__u64 trim_start = cpc->trim_start;
|
|
|
|
bool has_candidate = false;
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&SIT_I(sbi)->sentry_lock);
|
2016-12-30 14:06:15 +08:00
|
|
|
for (; cpc->trim_start <= cpc->trim_end; cpc->trim_start++) {
|
|
|
|
if (add_discard_addrs(sbi, cpc, true)) {
|
|
|
|
has_candidate = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&SIT_I(sbi)->sentry_lock);
|
2016-12-30 14:06:15 +08:00
|
|
|
|
|
|
|
cpc->trim_start = trim_start;
|
|
|
|
return has_candidate;
|
|
|
|
}
|
|
|
|
|
2014-09-21 13:06:39 +08:00
|
|
|
int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range)
|
|
|
|
{
|
2015-02-10 04:02:44 +08:00
|
|
|
__u64 start = F2FS_BYTES_TO_BLK(range->start);
|
|
|
|
__u64 end = start + F2FS_BYTES_TO_BLK(range->len) - 1;
|
2017-10-04 09:08:32 +08:00
|
|
|
unsigned int start_segno, end_segno, cur_segno;
|
|
|
|
block_t start_block, end_block;
|
2014-09-21 13:06:39 +08:00
|
|
|
struct cp_control cpc;
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy dpolicy;
|
2017-10-28 16:52:32 +08:00
|
|
|
unsigned long long trimmed = 0;
|
2015-12-23 17:50:30 +08:00
|
|
|
int err = 0;
|
2014-09-21 13:06:39 +08:00
|
|
|
|
2015-05-01 13:50:06 +08:00
|
|
|
if (start >= MAX_BLKADDR(sbi) || range->len < sbi->blocksize)
|
2014-09-21 13:06:39 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
if (end <= MAIN_BLKADDR(sbi))
|
2014-09-21 13:06:39 +08:00
|
|
|
goto out;
|
|
|
|
|
2016-09-01 10:14:39 +08:00
|
|
|
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK)) {
|
|
|
|
f2fs_msg(sbi->sb, KERN_WARNING,
|
|
|
|
"Found FS corruption, run fsck to fix.");
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2014-09-21 13:06:39 +08:00
|
|
|
/* start/end segment number in main_area */
|
2014-09-24 02:23:01 +08:00
|
|
|
start_segno = (start <= MAIN_BLKADDR(sbi)) ? 0 : GET_SEGNO(sbi, start);
|
|
|
|
end_segno = (end >= MAX_BLKADDR(sbi)) ? MAIN_SEGS(sbi) - 1 :
|
|
|
|
GET_SEGNO(sbi, end);
|
2017-10-04 09:08:32 +08:00
|
|
|
|
2014-09-21 13:06:39 +08:00
|
|
|
cpc.reason = CP_DISCARD;
|
2015-05-01 13:50:06 +08:00
|
|
|
cpc.trim_minlen = max_t(__u64, 1, F2FS_BYTES_TO_BLK(range->minlen));
|
2014-09-21 13:06:39 +08:00
|
|
|
|
|
|
|
/* do checkpoint to issue discard commands safely */
|
2017-10-04 09:08:32 +08:00
|
|
|
for (cur_segno = start_segno; cur_segno <= end_segno;
|
|
|
|
cur_segno = cpc.trim_end + 1) {
|
|
|
|
cpc.trim_start = cur_segno;
|
2015-05-01 13:37:50 +08:00
|
|
|
|
|
|
|
if (sbi->discard_blks == 0)
|
|
|
|
break;
|
|
|
|
else if (sbi->discard_blks < BATCHED_TRIM_BLOCKS(sbi))
|
|
|
|
cpc.trim_end = end_segno;
|
|
|
|
else
|
|
|
|
cpc.trim_end = min_t(unsigned int,
|
2017-10-04 09:08:32 +08:00
|
|
|
rounddown(cur_segno +
|
2015-01-27 09:41:23 +08:00
|
|
|
BATCHED_TRIM_SEGMENTS(sbi),
|
|
|
|
sbi->segs_per_sec) - 1, end_segno);
|
|
|
|
|
|
|
|
mutex_lock(&sbi->gc_mutex);
|
2015-12-23 17:50:30 +08:00
|
|
|
err = write_checkpoint(sbi, &cpc);
|
2015-01-27 09:41:23 +08:00
|
|
|
mutex_unlock(&sbi->gc_mutex);
|
2016-08-21 23:21:29 +08:00
|
|
|
if (err)
|
|
|
|
break;
|
2016-08-21 23:21:30 +08:00
|
|
|
|
|
|
|
schedule();
|
2015-01-27 09:41:23 +08:00
|
|
|
}
|
2017-10-04 09:08:32 +08:00
|
|
|
|
|
|
|
start_block = START_BLOCK(sbi, start_segno);
|
|
|
|
end_block = START_BLOCK(sbi, min(cur_segno, end_segno) + 1);
|
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
init_discard_policy(&dpolicy, DPOLICY_FSTRIM, cpc.trim_minlen);
|
|
|
|
__issue_discard_cmd_range(sbi, &dpolicy, start_block, end_block);
|
2017-10-28 16:52:32 +08:00
|
|
|
trimmed = __wait_discard_cmd_range(sbi, &dpolicy,
|
|
|
|
start_block, end_block);
|
2014-09-21 13:06:39 +08:00
|
|
|
out:
|
2017-10-28 16:52:32 +08:00
|
|
|
range->len = F2FS_BLK_TO_BYTES(trimmed);
|
2015-12-23 17:50:30 +08:00
|
|
|
return err;
|
2014-09-21 13:06:39 +08:00
|
|
|
}
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
static bool __has_curseg_space(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
if (curseg->next_blkoff < sbi->blocks_per_seg)
|
|
|
|
return true;
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2017-11-09 13:51:27 +08:00
|
|
|
int rw_hint_to_seg_type(enum rw_hint hint)
|
|
|
|
{
|
|
|
|
switch (hint) {
|
|
|
|
case WRITE_LIFE_SHORT:
|
|
|
|
return CURSEG_HOT_DATA;
|
|
|
|
case WRITE_LIFE_EXTREME:
|
|
|
|
return CURSEG_COLD_DATA;
|
|
|
|
default:
|
|
|
|
return CURSEG_WARM_DATA;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-31 10:36:57 +08:00
|
|
|
/* This returns write hints for each segment type. This hints will be
|
|
|
|
* passed down to block layer. There are mapping tables which depend on
|
|
|
|
* the mount option 'whint_mode'.
|
|
|
|
*
|
|
|
|
* 1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
|
|
|
|
*
|
|
|
|
* 2) whint_mode=user-based. F2FS tries to pass down hints given by users.
|
|
|
|
*
|
|
|
|
* User F2FS Block
|
|
|
|
* ---- ---- -----
|
|
|
|
* META WRITE_LIFE_NOT_SET
|
|
|
|
* HOT_NODE "
|
|
|
|
* WARM_NODE "
|
|
|
|
* COLD_NODE "
|
|
|
|
* ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
|
|
|
|
* extension list " "
|
|
|
|
*
|
|
|
|
* -- buffered io
|
|
|
|
* WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
|
|
|
|
* WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
|
|
|
|
* WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
|
|
|
|
* WRITE_LIFE_NONE " "
|
|
|
|
* WRITE_LIFE_MEDIUM " "
|
|
|
|
* WRITE_LIFE_LONG " "
|
|
|
|
*
|
|
|
|
* -- direct io
|
|
|
|
* WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
|
|
|
|
* WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
|
|
|
|
* WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
|
|
|
|
* WRITE_LIFE_NONE " WRITE_LIFE_NONE
|
|
|
|
* WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
|
|
|
|
* WRITE_LIFE_LONG " WRITE_LIFE_LONG
|
|
|
|
*
|
2018-01-31 10:36:58 +08:00
|
|
|
* 3) whint_mode=fs-based. F2FS passes down hints with its policy.
|
|
|
|
*
|
|
|
|
* User F2FS Block
|
|
|
|
* ---- ---- -----
|
|
|
|
* META WRITE_LIFE_MEDIUM;
|
|
|
|
* HOT_NODE WRITE_LIFE_NOT_SET
|
|
|
|
* WARM_NODE "
|
|
|
|
* COLD_NODE WRITE_LIFE_NONE
|
|
|
|
* ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
|
|
|
|
* extension list " "
|
|
|
|
*
|
|
|
|
* -- buffered io
|
|
|
|
* WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
|
|
|
|
* WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
|
|
|
|
* WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_LONG
|
|
|
|
* WRITE_LIFE_NONE " "
|
|
|
|
* WRITE_LIFE_MEDIUM " "
|
|
|
|
* WRITE_LIFE_LONG " "
|
|
|
|
*
|
|
|
|
* -- direct io
|
|
|
|
* WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
|
|
|
|
* WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
|
|
|
|
* WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
|
|
|
|
* WRITE_LIFE_NONE " WRITE_LIFE_NONE
|
|
|
|
* WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
|
|
|
|
* WRITE_LIFE_LONG " WRITE_LIFE_LONG
|
2018-01-31 10:36:57 +08:00
|
|
|
*/
|
|
|
|
|
|
|
|
enum rw_hint io_type_to_rw_hint(struct f2fs_sb_info *sbi,
|
|
|
|
enum page_type type, enum temp_type temp)
|
|
|
|
{
|
|
|
|
if (sbi->whint_mode == WHINT_MODE_USER) {
|
|
|
|
if (type == DATA) {
|
2018-01-31 10:36:58 +08:00
|
|
|
if (temp == WARM)
|
2018-01-31 10:36:57 +08:00
|
|
|
return WRITE_LIFE_NOT_SET;
|
2018-01-31 10:36:58 +08:00
|
|
|
else if (temp == HOT)
|
|
|
|
return WRITE_LIFE_SHORT;
|
|
|
|
else if (temp == COLD)
|
|
|
|
return WRITE_LIFE_EXTREME;
|
2018-01-31 10:36:57 +08:00
|
|
|
} else {
|
|
|
|
return WRITE_LIFE_NOT_SET;
|
|
|
|
}
|
2018-01-31 10:36:58 +08:00
|
|
|
} else if (sbi->whint_mode == WHINT_MODE_FS) {
|
|
|
|
if (type == DATA) {
|
|
|
|
if (temp == WARM)
|
|
|
|
return WRITE_LIFE_LONG;
|
|
|
|
else if (temp == HOT)
|
|
|
|
return WRITE_LIFE_SHORT;
|
|
|
|
else if (temp == COLD)
|
|
|
|
return WRITE_LIFE_EXTREME;
|
|
|
|
} else if (type == NODE) {
|
|
|
|
if (temp == WARM || temp == HOT)
|
|
|
|
return WRITE_LIFE_NOT_SET;
|
|
|
|
else if (temp == COLD)
|
|
|
|
return WRITE_LIFE_NONE;
|
|
|
|
} else if (type == META) {
|
|
|
|
return WRITE_LIFE_MEDIUM;
|
|
|
|
}
|
2018-01-31 10:36:57 +08:00
|
|
|
}
|
2018-01-31 10:36:58 +08:00
|
|
|
return WRITE_LIFE_NOT_SET;
|
2018-01-31 10:36:57 +08:00
|
|
|
}
|
|
|
|
|
2017-05-11 05:19:54 +08:00
|
|
|
static int __get_segment_type_2(struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-05-11 05:19:54 +08:00
|
|
|
if (fio->type == DATA)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return CURSEG_HOT_DATA;
|
|
|
|
else
|
|
|
|
return CURSEG_HOT_NODE;
|
|
|
|
}
|
|
|
|
|
2017-05-11 05:19:54 +08:00
|
|
|
static int __get_segment_type_4(struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-05-11 05:19:54 +08:00
|
|
|
if (fio->type == DATA) {
|
|
|
|
struct inode *inode = fio->page->mapping->host;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
if (S_ISDIR(inode->i_mode))
|
|
|
|
return CURSEG_HOT_DATA;
|
|
|
|
else
|
|
|
|
return CURSEG_COLD_DATA;
|
|
|
|
} else {
|
2017-05-11 05:19:54 +08:00
|
|
|
if (IS_DNODE(fio->page) && is_cold_node(fio->page))
|
2014-11-06 12:05:53 +08:00
|
|
|
return CURSEG_WARM_NODE;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
else
|
|
|
|
return CURSEG_COLD_NODE;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-05-11 05:19:54 +08:00
|
|
|
static int __get_segment_type_6(struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-05-11 05:19:54 +08:00
|
|
|
if (fio->type == DATA) {
|
|
|
|
struct inode *inode = fio->page->mapping->host;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2017-05-11 05:19:54 +08:00
|
|
|
if (is_cold_data(fio->page) || file_is_cold(inode))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return CURSEG_COLD_DATA;
|
2018-02-28 17:07:27 +08:00
|
|
|
if (file_is_hot(inode) ||
|
|
|
|
is_inode_flag_set(inode, FI_HOT_DATA))
|
2017-03-25 08:05:13 +08:00
|
|
|
return CURSEG_HOT_DATA;
|
2017-11-09 13:51:27 +08:00
|
|
|
return rw_hint_to_seg_type(inode->i_write_hint);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
} else {
|
2017-05-11 05:19:54 +08:00
|
|
|
if (IS_DNODE(fio->page))
|
|
|
|
return is_cold_node(fio->page) ? CURSEG_WARM_NODE :
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
CURSEG_HOT_NODE;
|
2017-03-25 08:05:13 +08:00
|
|
|
return CURSEG_COLD_NODE;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-05-11 05:19:54 +08:00
|
|
|
static int __get_segment_type(struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-05-11 02:18:25 +08:00
|
|
|
int type = 0;
|
|
|
|
|
2017-05-11 05:19:54 +08:00
|
|
|
switch (fio->sbi->active_logs) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
case 2:
|
2017-05-11 02:18:25 +08:00
|
|
|
type = __get_segment_type_2(fio);
|
|
|
|
break;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
case 4:
|
2017-05-11 02:18:25 +08:00
|
|
|
type = __get_segment_type_4(fio);
|
|
|
|
break;
|
|
|
|
case 6:
|
|
|
|
type = __get_segment_type_6(fio);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
f2fs_bug_on(fio->sbi, true);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
2017-05-11 05:19:54 +08:00
|
|
|
|
2017-05-11 02:18:25 +08:00
|
|
|
if (IS_HOT(type))
|
|
|
|
fio->temp = HOT;
|
|
|
|
else if (IS_WARM(type))
|
|
|
|
fio->temp = WARM;
|
|
|
|
else
|
|
|
|
fio->temp = COLD;
|
|
|
|
return type;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2013-12-16 18:04:05 +08:00
|
|
|
void allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
|
|
|
|
block_t old_blkaddr, block_t *new_blkaddr,
|
2017-05-19 23:37:01 +08:00
|
|
|
struct f2fs_summary *sum, int type,
|
|
|
|
struct f2fs_io_info *fio, bool add_list)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
2016-11-12 04:31:40 +08:00
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
down_read(&SM_I(sbi)->curseg_lock);
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
*new_blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
|
|
|
|
|
2016-12-30 06:07:53 +08:00
|
|
|
f2fs_wait_discard_bio(sbi, *new_blkaddr);
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/*
|
|
|
|
* __add_sum_entry should be resided under the curseg_mutex
|
|
|
|
* because, this function updates a summary entry in the
|
|
|
|
* current summary block.
|
|
|
|
*/
|
2013-06-13 16:59:27 +08:00
|
|
|
__add_sum_entry(sbi, type, sum);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
__refresh_next_blkoff(sbi, curseg);
|
2013-10-22 19:56:10 +08:00
|
|
|
|
|
|
|
stat_inc_block_count(sbi, curseg);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2017-10-30 09:33:41 +08:00
|
|
|
/*
|
|
|
|
* SIT information should be updated before segment allocation,
|
|
|
|
* since SSR needs latest valid block information.
|
|
|
|
*/
|
|
|
|
update_sit_entry(sbi, *new_blkaddr, 1);
|
|
|
|
if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO)
|
|
|
|
update_sit_entry(sbi, old_blkaddr, -1);
|
|
|
|
|
2017-04-05 07:45:30 +08:00
|
|
|
if (!__has_curseg_space(sbi, type))
|
|
|
|
sit_i->s_ops->allocate_segment(sbi, type, false);
|
2017-10-30 09:33:41 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/*
|
2017-10-30 09:33:41 +08:00
|
|
|
* segment dirty status should be updated after segment allocation,
|
|
|
|
* so we just need to update status only one time after previous
|
|
|
|
* segment being closed.
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
*/
|
2017-10-30 09:33:41 +08:00
|
|
|
locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
|
|
|
|
locate_dirty_segment(sbi, GET_SEGNO(sbi, *new_blkaddr));
|
2014-01-28 11:22:14 +08:00
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2017-07-31 20:19:09 +08:00
|
|
|
if (page && IS_NODESEG(type)) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
fill_node_footer_blkaddr(page, NEXT_FREE_BLKADDR(sbi, curseg));
|
|
|
|
|
2017-07-31 20:19:09 +08:00
|
|
|
f2fs_inode_chksum_set(sbi, page);
|
|
|
|
}
|
|
|
|
|
2017-05-19 23:37:01 +08:00
|
|
|
if (add_list) {
|
|
|
|
struct f2fs_bio_info *io;
|
|
|
|
|
|
|
|
INIT_LIST_HEAD(&fio->list);
|
|
|
|
fio->in_list = true;
|
|
|
|
io = sbi->write_io[fio->type] + fio->temp;
|
|
|
|
spin_lock(&io->io_lock);
|
|
|
|
list_add_tail(&fio->list, &io->io_list);
|
|
|
|
spin_unlock(&io->io_lock);
|
|
|
|
}
|
|
|
|
|
2013-12-16 18:04:05 +08:00
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
|
|
|
|
up_read(&SM_I(sbi)->curseg_lock);
|
2013-12-16 18:04:05 +08:00
|
|
|
}
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
static void update_device_state(struct f2fs_io_info *fio)
|
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = fio->sbi;
|
|
|
|
unsigned int devidx;
|
|
|
|
|
|
|
|
if (!sbi->s_ndevs)
|
|
|
|
return;
|
|
|
|
|
|
|
|
devidx = f2fs_target_device_index(sbi, fio->new_blkaddr);
|
|
|
|
|
|
|
|
/* update device state for fsync */
|
|
|
|
set_dirty_device(sbi, fio->ino, devidx, FLUSH_INO);
|
2017-09-29 13:59:39 +08:00
|
|
|
|
|
|
|
/* update device state for checkpoint */
|
|
|
|
if (!f2fs_test_bit(devidx, (char *)&sbi->dirty_device)) {
|
|
|
|
spin_lock(&sbi->dev_lock);
|
|
|
|
f2fs_set_bit(devidx, (char *)&sbi->dirty_device);
|
|
|
|
spin_unlock(&sbi->dev_lock);
|
|
|
|
}
|
2017-09-29 13:59:38 +08:00
|
|
|
}
|
|
|
|
|
2015-04-24 05:38:15 +08:00
|
|
|
static void do_write_page(struct f2fs_summary *sum, struct f2fs_io_info *fio)
|
2013-12-16 18:04:05 +08:00
|
|
|
{
|
2017-05-11 05:19:54 +08:00
|
|
|
int type = __get_segment_type(fio);
|
2016-12-15 02:12:56 +08:00
|
|
|
int err;
|
2013-12-16 18:04:05 +08:00
|
|
|
|
2016-12-15 02:12:56 +08:00
|
|
|
reallocate:
|
f2fs: trace old block address for CoWed page
This patch enables to trace old block address of CoWed page for better
debugging.
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 18:36:38 +08:00
|
|
|
allocate_data_block(fio->sbi, fio->page, fio->old_blkaddr,
|
2017-05-19 23:37:01 +08:00
|
|
|
&fio->new_blkaddr, sum, type, fio, true);
|
2013-12-16 18:04:05 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/* writeout dirty page into bdev */
|
2017-05-11 02:28:38 +08:00
|
|
|
err = f2fs_submit_page_write(fio);
|
2016-12-15 02:12:56 +08:00
|
|
|
if (err == -EAGAIN) {
|
|
|
|
fio->old_blkaddr = fio->new_blkaddr;
|
|
|
|
goto reallocate;
|
2017-09-29 13:59:38 +08:00
|
|
|
} else if (!err) {
|
|
|
|
update_device_state(fio);
|
2016-12-15 02:12:56 +08:00
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2017-08-02 23:21:48 +08:00
|
|
|
void write_meta_page(struct f2fs_sb_info *sbi, struct page *page,
|
|
|
|
enum iostat_type io_type)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2013-12-11 12:54:01 +08:00
|
|
|
struct f2fs_io_info fio = {
|
2015-04-24 05:38:15 +08:00
|
|
|
.sbi = sbi,
|
2013-12-11 12:54:01 +08:00
|
|
|
.type = META,
|
2018-01-31 10:36:57 +08:00
|
|
|
.temp = HOT,
|
2016-06-06 03:31:55 +08:00
|
|
|
.op = REQ_OP_WRITE,
|
2016-11-01 21:40:10 +08:00
|
|
|
.op_flags = REQ_SYNC | REQ_META | REQ_PRIO,
|
f2fs: trace old block address for CoWed page
This patch enables to trace old block address of CoWed page for better
debugging.
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 18:36:38 +08:00
|
|
|
.old_blkaddr = page->index,
|
|
|
|
.new_blkaddr = page->index,
|
2015-04-24 05:38:15 +08:00
|
|
|
.page = page,
|
2015-04-24 03:04:33 +08:00
|
|
|
.encrypted_page = NULL,
|
2017-05-19 23:37:01 +08:00
|
|
|
.in_list = false,
|
2013-12-11 12:54:01 +08:00
|
|
|
};
|
|
|
|
|
2015-10-12 17:04:21 +08:00
|
|
|
if (unlikely(page->index >= MAIN_BLKADDR(sbi)))
|
2016-06-06 03:31:55 +08:00
|
|
|
fio.op_flags &= ~REQ_META;
|
2015-10-12 17:04:21 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
set_page_writeback(page);
|
2017-05-11 02:28:38 +08:00
|
|
|
f2fs_submit_page_write(&fio);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
|
|
|
f2fs_update_iostat(sbi, io_type, F2FS_BLKSIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2015-04-24 05:38:15 +08:00
|
|
|
void write_node_page(unsigned int nid, struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct f2fs_summary sum;
|
2015-04-24 05:38:15 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
set_summary(&sum, nid, 0, 0);
|
2015-04-24 05:38:15 +08:00
|
|
|
do_write_page(&sum, fio);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
|
|
|
f2fs_update_iostat(fio->sbi, fio->io_type, F2FS_BLKSIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2015-04-24 05:38:15 +08:00
|
|
|
void write_data_page(struct dnode_of_data *dn, struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2015-04-24 05:38:15 +08:00
|
|
|
struct f2fs_sb_info *sbi = fio->sbi;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
struct f2fs_summary sum;
|
|
|
|
struct node_info ni;
|
|
|
|
|
2014-09-03 06:52:58 +08:00
|
|
|
f2fs_bug_on(sbi, dn->data_blkaddr == NULL_ADDR);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
get_node_info(sbi, dn->nid, &ni);
|
|
|
|
set_summary(&sum, dn->nid, dn->ofs_in_node, ni.version);
|
2015-04-24 05:38:15 +08:00
|
|
|
do_write_page(&sum, fio);
|
2016-02-24 17:16:47 +08:00
|
|
|
f2fs_update_data_blkaddr(dn, fio->new_blkaddr);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
|
|
|
f2fs_update_iostat(sbi, fio->io_type, F2FS_BLKSIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2017-03-31 12:02:46 +08:00
|
|
|
int rewrite_data_page(struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-08-02 23:21:48 +08:00
|
|
|
int err;
|
|
|
|
|
f2fs: trace old block address for CoWed page
This patch enables to trace old block address of CoWed page for better
debugging.
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 18:36:38 +08:00
|
|
|
fio->new_blkaddr = fio->old_blkaddr;
|
2018-01-31 10:36:57 +08:00
|
|
|
/* i/o temperature is needed for passing down write hints */
|
|
|
|
__get_segment_type(fio);
|
2015-04-24 05:38:15 +08:00
|
|
|
stat_inc_inplace_blocks(fio->sbi);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
|
|
|
err = f2fs_submit_page_bio(fio);
|
2017-09-29 13:59:38 +08:00
|
|
|
if (!err)
|
|
|
|
update_device_state(fio);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
|
|
|
f2fs_update_iostat(fio->sbi, fio->io_type, F2FS_BLKSIZE);
|
|
|
|
|
|
|
|
return err;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
static inline int __f2fs_get_curseg(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int segno)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = CURSEG_HOT_DATA; i < NO_CHECK_TYPE; i++) {
|
|
|
|
if (CURSEG_I(sbi, i)->segno == segno)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
return i;
|
|
|
|
}
|
|
|
|
|
2016-02-23 17:52:43 +08:00
|
|
|
void __f2fs_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
|
2015-05-06 13:08:06 +08:00
|
|
|
block_t old_blkaddr, block_t new_blkaddr,
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
bool recover_curseg, bool recover_newaddr)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
struct curseg_info *curseg;
|
|
|
|
unsigned int segno, old_cursegno;
|
|
|
|
struct seg_entry *se;
|
|
|
|
int type;
|
2015-05-06 13:08:06 +08:00
|
|
|
unsigned short old_blkoff;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
segno = GET_SEGNO(sbi, new_blkaddr);
|
|
|
|
se = get_seg_entry(sbi, segno);
|
|
|
|
type = se->type;
|
|
|
|
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
down_write(&SM_I(sbi)->curseg_lock);
|
|
|
|
|
2015-05-06 13:08:06 +08:00
|
|
|
if (!recover_curseg) {
|
|
|
|
/* for recovery flow */
|
|
|
|
if (se->valid_blocks == 0 && !IS_CURSEG(sbi, segno)) {
|
|
|
|
if (old_blkaddr == NULL_ADDR)
|
|
|
|
type = CURSEG_COLD_DATA;
|
|
|
|
else
|
|
|
|
type = CURSEG_WARM_DATA;
|
|
|
|
}
|
|
|
|
} else {
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
if (IS_CURSEG(sbi, segno)) {
|
|
|
|
/* se->type is volatile as SSR allocation */
|
|
|
|
type = __f2fs_get_curseg(sbi, segno);
|
|
|
|
f2fs_bug_on(sbi, type == NO_CHECK_TYPE);
|
|
|
|
} else {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
type = CURSEG_WARM_DATA;
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
2015-05-06 13:08:06 +08:00
|
|
|
|
f2fs: check segment type in __f2fs_replace_block
In some case, the node blocks has wrong blkaddr whose segment type is
NODE, e.g., recover inode has missing xattr flag and the blkaddr is in
the xattr range. Since fsck.f2fs does not check the recovery nodes, this
will cause __f2fs_replace_block change the curseg of node and do the
update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a
result, when recovery process write checkpoint and sync nodes, the
next_blkoff of curseg is used in the segment bit map, then it will
cause f2fs_bug_on. So let's check segment type in __f2fs_replace_block.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-04 15:02:02 +08:00
|
|
|
f2fs_bug_on(sbi, !IS_DATASEG(type));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
curseg = CURSEG_I(sbi, type);
|
|
|
|
|
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
old_cursegno = curseg->segno;
|
2015-05-06 13:08:06 +08:00
|
|
|
old_blkoff = curseg->next_blkoff;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
/* change the current segment */
|
|
|
|
if (segno != curseg->segno) {
|
|
|
|
curseg->next_segno = segno;
|
2017-08-30 18:04:48 +08:00
|
|
|
change_curseg(sbi, type);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2014-02-04 12:01:10 +08:00
|
|
|
curseg->next_blkoff = GET_BLKOFF_FROM_SEG0(sbi, new_blkaddr);
|
2013-06-13 16:59:27 +08:00
|
|
|
__add_sum_entry(sbi, type, sum);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
if (!recover_curseg || recover_newaddr)
|
2015-10-08 03:28:41 +08:00
|
|
|
update_sit_entry(sbi, new_blkaddr, 1);
|
|
|
|
if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO)
|
|
|
|
update_sit_entry(sbi, old_blkaddr, -1);
|
|
|
|
|
|
|
|
locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
|
|
|
|
locate_dirty_segment(sbi, GET_SEGNO(sbi, new_blkaddr));
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
locate_dirty_segment(sbi, old_cursegno);
|
|
|
|
|
2015-05-06 13:08:06 +08:00
|
|
|
if (recover_curseg) {
|
|
|
|
if (old_cursegno != curseg->segno) {
|
|
|
|
curseg->next_segno = old_cursegno;
|
2017-08-30 18:04:48 +08:00
|
|
|
change_curseg(sbi, type);
|
2015-05-06 13:08:06 +08:00
|
|
|
}
|
|
|
|
curseg->next_blkoff = old_blkoff;
|
|
|
|
}
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
up_write(&SM_I(sbi)->curseg_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2015-05-28 19:15:35 +08:00
|
|
|
void f2fs_replace_block(struct f2fs_sb_info *sbi, struct dnode_of_data *dn,
|
|
|
|
block_t old_addr, block_t new_addr,
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
unsigned char version, bool recover_curseg,
|
|
|
|
bool recover_newaddr)
|
2015-05-28 19:15:35 +08:00
|
|
|
{
|
|
|
|
struct f2fs_summary sum;
|
|
|
|
|
|
|
|
set_summary(&sum, dn->nid, dn->ofs_in_node, version);
|
|
|
|
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
__f2fs_replace_block(sbi, &sum, old_addr, new_addr,
|
|
|
|
recover_curseg, recover_newaddr);
|
2015-05-28 19:15:35 +08:00
|
|
|
|
2016-02-24 17:16:47 +08:00
|
|
|
f2fs_update_data_blkaddr(dn, new_addr);
|
2015-05-28 19:15:35 +08:00
|
|
|
}
|
|
|
|
|
2013-11-30 11:51:14 +08:00
|
|
|
void f2fs_wait_on_page_writeback(struct page *page,
|
2016-01-20 23:43:51 +08:00
|
|
|
enum page_type type, bool ordered)
|
2013-11-30 11:51:14 +08:00
|
|
|
{
|
|
|
|
if (PageWriteback(page)) {
|
2014-09-03 06:31:18 +08:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_P_SB(page);
|
|
|
|
|
2017-05-11 02:28:38 +08:00
|
|
|
f2fs_submit_merged_write_cond(sbi, page->mapping->host,
|
|
|
|
0, page->index, type);
|
2016-01-20 23:43:51 +08:00
|
|
|
if (ordered)
|
|
|
|
wait_on_page_writeback(page);
|
|
|
|
else
|
|
|
|
wait_for_stable_page(page);
|
2013-11-30 11:51:14 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-09-06 08:04:35 +08:00
|
|
|
void f2fs_wait_on_block_writeback(struct f2fs_sb_info *sbi, block_t blkaddr)
|
2015-10-08 13:27:34 +08:00
|
|
|
{
|
|
|
|
struct page *cpage;
|
|
|
|
|
2016-09-18 08:16:56 +08:00
|
|
|
if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR)
|
2015-10-08 13:27:34 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
cpage = find_lock_page(META_MAPPING(sbi), blkaddr);
|
|
|
|
if (cpage) {
|
2016-01-20 23:43:51 +08:00
|
|
|
f2fs_wait_on_page_writeback(cpage, DATA, true);
|
2015-10-08 13:27:34 +08:00
|
|
|
f2fs_put_page(cpage, 1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-12-06 11:31:29 +08:00
|
|
|
static void read_compacted_summaries(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
|
|
|
|
struct curseg_info *seg_i;
|
|
|
|
unsigned char *kaddr;
|
|
|
|
struct page *page;
|
|
|
|
block_t start;
|
|
|
|
int i, j, offset;
|
|
|
|
|
|
|
|
start = start_sum_block(sbi);
|
|
|
|
|
|
|
|
page = get_meta_page(sbi, start++);
|
|
|
|
kaddr = (unsigned char *)page_address(page);
|
|
|
|
|
|
|
|
/* Step 1: restore nat cache */
|
|
|
|
seg_i = CURSEG_I(sbi, CURSEG_HOT_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
memcpy(seg_i->journal, kaddr, SUM_JOURNAL_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
/* Step 2: restore sit cache */
|
|
|
|
seg_i = CURSEG_I(sbi, CURSEG_COLD_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
memcpy(seg_i->journal, kaddr + SUM_JOURNAL_SIZE, SUM_JOURNAL_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
offset = 2 * SUM_JOURNAL_SIZE;
|
|
|
|
|
|
|
|
/* Step 3: restore summary entries */
|
|
|
|
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
|
|
|
|
unsigned short blk_off;
|
|
|
|
unsigned int segno;
|
|
|
|
|
|
|
|
seg_i = CURSEG_I(sbi, i);
|
|
|
|
segno = le32_to_cpu(ckpt->cur_data_segno[i]);
|
|
|
|
blk_off = le16_to_cpu(ckpt->cur_data_blkoff[i]);
|
|
|
|
seg_i->next_segno = segno;
|
|
|
|
reset_curseg(sbi, i, 0);
|
|
|
|
seg_i->alloc_type = ckpt->alloc_type[i];
|
|
|
|
seg_i->next_blkoff = blk_off;
|
|
|
|
|
|
|
|
if (seg_i->alloc_type == SSR)
|
|
|
|
blk_off = sbi->blocks_per_seg;
|
|
|
|
|
|
|
|
for (j = 0; j < blk_off; j++) {
|
|
|
|
struct f2fs_summary *s;
|
|
|
|
s = (struct f2fs_summary *)(kaddr + offset);
|
|
|
|
seg_i->sum_blk->entries[j] = *s;
|
|
|
|
offset += SUMMARY_SIZE;
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
if (offset + SUMMARY_SIZE <= PAGE_SIZE -
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
SUM_FOOTER_SIZE)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
page = NULL;
|
|
|
|
|
|
|
|
page = get_meta_page(sbi, start++);
|
|
|
|
kaddr = (unsigned char *)page_address(page);
|
|
|
|
offset = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int read_normal_summaries(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
|
|
|
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
|
|
|
|
struct f2fs_summary_block *sum;
|
|
|
|
struct curseg_info *curseg;
|
|
|
|
struct page *new;
|
|
|
|
unsigned short blk_off;
|
|
|
|
unsigned int segno = 0;
|
|
|
|
block_t blk_addr = 0;
|
|
|
|
|
|
|
|
/* get segment number and block addr */
|
|
|
|
if (IS_DATASEG(type)) {
|
|
|
|
segno = le32_to_cpu(ckpt->cur_data_segno[type]);
|
|
|
|
blk_off = le16_to_cpu(ckpt->cur_data_blkoff[type -
|
|
|
|
CURSEG_HOT_DATA]);
|
2015-01-30 03:45:33 +08:00
|
|
|
if (__exist_node_summaries(sbi))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
blk_addr = sum_blk_addr(sbi, NR_CURSEG_TYPE, type);
|
|
|
|
else
|
|
|
|
blk_addr = sum_blk_addr(sbi, NR_CURSEG_DATA_TYPE, type);
|
|
|
|
} else {
|
|
|
|
segno = le32_to_cpu(ckpt->cur_node_segno[type -
|
|
|
|
CURSEG_HOT_NODE]);
|
|
|
|
blk_off = le16_to_cpu(ckpt->cur_node_blkoff[type -
|
|
|
|
CURSEG_HOT_NODE]);
|
2015-01-30 03:45:33 +08:00
|
|
|
if (__exist_node_summaries(sbi))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
blk_addr = sum_blk_addr(sbi, NR_CURSEG_NODE_TYPE,
|
|
|
|
type - CURSEG_HOT_NODE);
|
|
|
|
else
|
|
|
|
blk_addr = GET_SUM_BLOCK(sbi, segno);
|
|
|
|
}
|
|
|
|
|
|
|
|
new = get_meta_page(sbi, blk_addr);
|
|
|
|
sum = (struct f2fs_summary_block *)page_address(new);
|
|
|
|
|
|
|
|
if (IS_NODESEG(type)) {
|
2015-01-30 03:45:33 +08:00
|
|
|
if (__exist_node_summaries(sbi)) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
struct f2fs_summary *ns = &sum->entries[0];
|
|
|
|
int i;
|
|
|
|
for (i = 0; i < sbi->blocks_per_seg; i++, ns++) {
|
|
|
|
ns->version = 0;
|
|
|
|
ns->ofs_in_node = 0;
|
|
|
|
}
|
|
|
|
} else {
|
2017-12-06 11:31:29 +08:00
|
|
|
restore_node_summary(sbi, segno, sum);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* set uncompleted segment to curseg */
|
|
|
|
curseg = CURSEG_I(sbi, type);
|
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
|
|
|
|
/* update journal info */
|
|
|
|
down_write(&curseg->journal_rwsem);
|
|
|
|
memcpy(curseg->journal, &sum->journal, SUM_JOURNAL_SIZE);
|
|
|
|
up_write(&curseg->journal_rwsem);
|
|
|
|
|
|
|
|
memcpy(curseg->sum_blk->entries, sum->entries, SUM_ENTRY_SIZE);
|
|
|
|
memcpy(&curseg->sum_blk->footer, &sum->footer, SUM_FOOTER_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
curseg->next_segno = segno;
|
|
|
|
reset_curseg(sbi, type, 0);
|
|
|
|
curseg->alloc_type = ckpt->alloc_type[type];
|
|
|
|
curseg->next_blkoff = blk_off;
|
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
|
|
|
f2fs_put_page(new, 1);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int restore_curseg_summaries(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2017-06-02 02:18:30 +08:00
|
|
|
struct f2fs_journal *sit_j = CURSEG_I(sbi, CURSEG_COLD_DATA)->journal;
|
|
|
|
struct f2fs_journal *nat_j = CURSEG_I(sbi, CURSEG_HOT_DATA)->journal;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
int type = CURSEG_HOT_DATA;
|
2014-03-17 16:36:24 +08:00
|
|
|
int err;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2016-09-20 11:04:18 +08:00
|
|
|
if (is_set_ckpt_flags(sbi, CP_COMPACT_SUM_FLAG)) {
|
2014-12-09 14:21:46 +08:00
|
|
|
int npages = npages_for_summary_flush(sbi, true);
|
|
|
|
|
|
|
|
if (npages >= 2)
|
|
|
|
ra_meta_pages(sbi, start_sum_block(sbi), npages,
|
2015-10-12 17:05:59 +08:00
|
|
|
META_CP, true);
|
2014-12-09 14:21:46 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/* restore for compacted data summary */
|
2017-12-06 11:31:29 +08:00
|
|
|
read_compacted_summaries(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
type = CURSEG_HOT_NODE;
|
|
|
|
}
|
|
|
|
|
2015-01-30 03:45:33 +08:00
|
|
|
if (__exist_node_summaries(sbi))
|
2014-12-09 14:21:46 +08:00
|
|
|
ra_meta_pages(sbi, sum_blk_addr(sbi, NR_CURSEG_TYPE, type),
|
2015-10-12 17:05:59 +08:00
|
|
|
NR_CURSEG_TYPE - type, META_CP, true);
|
2014-12-09 14:21:46 +08:00
|
|
|
|
2014-03-17 16:36:24 +08:00
|
|
|
for (; type <= CURSEG_COLD_NODE; type++) {
|
|
|
|
err = read_normal_summaries(sbi, type);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-06-02 02:18:30 +08:00
|
|
|
/* sanity check for summary blocks */
|
|
|
|
if (nats_in_cursum(nat_j) > NAT_JOURNAL_ENTRIES ||
|
|
|
|
sits_in_cursum(sit_j) > SIT_JOURNAL_ENTRIES)
|
|
|
|
return -EINVAL;
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void write_compacted_summaries(struct f2fs_sb_info *sbi, block_t blkaddr)
|
|
|
|
{
|
|
|
|
struct page *page;
|
|
|
|
unsigned char *kaddr;
|
|
|
|
struct f2fs_summary *summary;
|
|
|
|
struct curseg_info *seg_i;
|
|
|
|
int written_size = 0;
|
|
|
|
int i, j;
|
|
|
|
|
|
|
|
page = grab_meta_page(sbi, blkaddr++);
|
|
|
|
kaddr = (unsigned char *)page_address(page);
|
|
|
|
|
|
|
|
/* Step 1: write nat cache */
|
|
|
|
seg_i = CURSEG_I(sbi, CURSEG_HOT_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
memcpy(kaddr, seg_i->journal, SUM_JOURNAL_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
written_size += SUM_JOURNAL_SIZE;
|
|
|
|
|
|
|
|
/* Step 2: write sit cache */
|
|
|
|
seg_i = CURSEG_I(sbi, CURSEG_COLD_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
memcpy(kaddr + written_size, seg_i->journal, SUM_JOURNAL_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
written_size += SUM_JOURNAL_SIZE;
|
|
|
|
|
|
|
|
/* Step 3: write summary entries */
|
|
|
|
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
|
|
|
|
unsigned short blkoff;
|
|
|
|
seg_i = CURSEG_I(sbi, i);
|
|
|
|
if (sbi->ckpt->alloc_type[i] == SSR)
|
|
|
|
blkoff = sbi->blocks_per_seg;
|
|
|
|
else
|
|
|
|
blkoff = curseg_blkoff(sbi, i);
|
|
|
|
|
|
|
|
for (j = 0; j < blkoff; j++) {
|
|
|
|
if (!page) {
|
|
|
|
page = grab_meta_page(sbi, blkaddr++);
|
|
|
|
kaddr = (unsigned char *)page_address(page);
|
|
|
|
written_size = 0;
|
|
|
|
}
|
|
|
|
summary = (struct f2fs_summary *)(kaddr + written_size);
|
|
|
|
*summary = seg_i->sum_blk->entries[j];
|
|
|
|
written_size += SUMMARY_SIZE;
|
|
|
|
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
if (written_size + SUMMARY_SIZE <= PAGE_SIZE -
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
SUM_FOOTER_SIZE)
|
|
|
|
continue;
|
|
|
|
|
2013-10-24 15:08:28 +08:00
|
|
|
set_page_dirty(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
page = NULL;
|
|
|
|
}
|
|
|
|
}
|
2013-10-24 15:08:28 +08:00
|
|
|
if (page) {
|
|
|
|
set_page_dirty(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
f2fs_put_page(page, 1);
|
2013-10-24 15:08:28 +08:00
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void write_normal_summaries(struct f2fs_sb_info *sbi,
|
|
|
|
block_t blkaddr, int type)
|
|
|
|
{
|
|
|
|
int i, end;
|
|
|
|
if (IS_DATASEG(type))
|
|
|
|
end = type + NR_CURSEG_DATA_TYPE;
|
|
|
|
else
|
|
|
|
end = type + NR_CURSEG_NODE_TYPE;
|
|
|
|
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
for (i = type; i < end; i++)
|
|
|
|
write_current_sum_page(sbi, i, blkaddr + (i - type));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk)
|
|
|
|
{
|
2016-09-20 11:04:18 +08:00
|
|
|
if (is_set_ckpt_flags(sbi, CP_COMPACT_SUM_FLAG))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
write_compacted_summaries(sbi, start_blk);
|
|
|
|
else
|
|
|
|
write_normal_summaries(sbi, start_blk, CURSEG_HOT_DATA);
|
|
|
|
}
|
|
|
|
|
|
|
|
void write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk)
|
|
|
|
{
|
2015-01-30 03:45:33 +08:00
|
|
|
write_normal_summaries(sbi, start_blk, CURSEG_HOT_NODE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2016-02-14 18:50:40 +08:00
|
|
|
int lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
unsigned int val, int alloc)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (type == NAT_JOURNAL) {
|
2016-02-14 18:50:40 +08:00
|
|
|
for (i = 0; i < nats_in_cursum(journal); i++) {
|
|
|
|
if (le32_to_cpu(nid_in_journal(journal, i)) == val)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return i;
|
|
|
|
}
|
2016-02-14 18:50:40 +08:00
|
|
|
if (alloc && __has_cursum_space(journal, 1, NAT_JOURNAL))
|
|
|
|
return update_nats_in_cursum(journal, 1);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
} else if (type == SIT_JOURNAL) {
|
2016-02-14 18:50:40 +08:00
|
|
|
for (i = 0; i < sits_in_cursum(journal); i++)
|
|
|
|
if (le32_to_cpu(segno_in_journal(journal, i)) == val)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return i;
|
2016-02-14 18:50:40 +08:00
|
|
|
if (alloc && __has_cursum_space(journal, 1, SIT_JOURNAL))
|
|
|
|
return update_sits_in_cursum(journal, 1);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct page *get_current_sit_page(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int segno)
|
|
|
|
{
|
2014-10-20 17:45:49 +08:00
|
|
|
return get_meta_page(sbi, current_sit_addr(sbi, segno));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct page *get_next_sit_page(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int start)
|
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
f2fs: rebuild sit page from sit info in mem
This patch rebuild sit page from sit info in mem instead
of issue a read io.
I test this method and the result is as below:
Pre:
mmc_perf_test-12061 [001] ...1 976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [001] ...1 1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Some sit flush consume more than 50ms.
Now:
mmc_perf_test-2187 [007] ...1 196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [007] ...1 219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [002] ...1 265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [005] ...1 343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [005] ...1 343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [001] ...1 397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [001] ...1 397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Most sit flush consume no more than 1ms.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 17:27:11 +08:00
|
|
|
struct page *page;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
pgoff_t src_off, dst_off;
|
|
|
|
|
|
|
|
src_off = current_sit_addr(sbi, start);
|
|
|
|
dst_off = next_sit_addr(sbi, src_off);
|
|
|
|
|
f2fs: rebuild sit page from sit info in mem
This patch rebuild sit page from sit info in mem instead
of issue a read io.
I test this method and the result is as below:
Pre:
mmc_perf_test-12061 [001] ...1 976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [001] ...1 1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Some sit flush consume more than 50ms.
Now:
mmc_perf_test-2187 [007] ...1 196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [007] ...1 219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [002] ...1 265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [005] ...1 343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [005] ...1 343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [001] ...1 397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [001] ...1 397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Most sit flush consume no more than 1ms.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 17:27:11 +08:00
|
|
|
page = grab_meta_page(sbi, dst_off);
|
|
|
|
seg_info_to_sit_page(sbi, page, start);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: rebuild sit page from sit info in mem
This patch rebuild sit page from sit info in mem instead
of issue a read io.
I test this method and the result is as below:
Pre:
mmc_perf_test-12061 [001] ...1 976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [001] ...1 1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Some sit flush consume more than 50ms.
Now:
mmc_perf_test-2187 [007] ...1 196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [007] ...1 219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [002] ...1 265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [005] ...1 343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [005] ...1 343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [001] ...1 397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [001] ...1 397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Most sit flush consume no more than 1ms.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 17:27:11 +08:00
|
|
|
set_page_dirty(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
set_to_next_sit(sit_i, start);
|
|
|
|
|
f2fs: rebuild sit page from sit info in mem
This patch rebuild sit page from sit info in mem instead
of issue a read io.
I test this method and the result is as below:
Pre:
mmc_perf_test-12061 [001] ...1 976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [001] ...1 1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Some sit flush consume more than 50ms.
Now:
mmc_perf_test-2187 [007] ...1 196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [007] ...1 219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [002] ...1 265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [005] ...1 343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [005] ...1 343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [001] ...1 397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [001] ...1 397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Most sit flush consume no more than 1ms.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 17:27:11 +08:00
|
|
|
return page;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
static struct sit_entry_set *grab_sit_entry_set(void)
|
|
|
|
{
|
|
|
|
struct sit_entry_set *ses =
|
2015-08-20 23:51:56 +08:00
|
|
|
f2fs_kmem_cache_alloc(sit_entry_set_slab, GFP_NOFS);
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
|
|
|
ses->entry_cnt = 0;
|
|
|
|
INIT_LIST_HEAD(&ses->set_list);
|
|
|
|
return ses;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void release_sit_entry_set(struct sit_entry_set *ses)
|
|
|
|
{
|
|
|
|
list_del(&ses->set_list);
|
|
|
|
kmem_cache_free(sit_entry_set_slab, ses);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void adjust_sit_entry_set(struct sit_entry_set *ses,
|
|
|
|
struct list_head *head)
|
|
|
|
{
|
|
|
|
struct sit_entry_set *next = ses;
|
|
|
|
|
|
|
|
if (list_is_last(&ses->set_list, head))
|
|
|
|
return;
|
|
|
|
|
|
|
|
list_for_each_entry_continue(next, head, set_list)
|
|
|
|
if (ses->entry_cnt <= next->entry_cnt)
|
|
|
|
break;
|
|
|
|
|
|
|
|
list_move_tail(&ses->set_list, &next->set_list);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void add_sit_entry(unsigned int segno, struct list_head *head)
|
|
|
|
{
|
|
|
|
struct sit_entry_set *ses;
|
|
|
|
unsigned int start_segno = START_SEGNO(segno);
|
|
|
|
|
|
|
|
list_for_each_entry(ses, head, set_list) {
|
|
|
|
if (ses->start_segno == start_segno) {
|
|
|
|
ses->entry_cnt++;
|
|
|
|
adjust_sit_entry_set(ses, head);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
ses = grab_sit_entry_set();
|
|
|
|
|
|
|
|
ses->start_segno = start_segno;
|
|
|
|
ses->entry_cnt++;
|
|
|
|
list_add(&ses->set_list, head);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void add_sits_in_set(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct f2fs_sm_info *sm_info = SM_I(sbi);
|
|
|
|
struct list_head *set_list = &sm_info->sit_entry_set;
|
|
|
|
unsigned long *bitmap = SIT_I(sbi)->dirty_sentries_bitmap;
|
|
|
|
unsigned int segno;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
for_each_set_bit(segno, bitmap, MAIN_SEGS(sbi))
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
add_sit_entry(segno, set_list);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void remove_sits_in_journal(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
struct f2fs_journal *journal = curseg->journal;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
int i;
|
|
|
|
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
down_write(&curseg->journal_rwsem);
|
2016-02-14 18:50:40 +08:00
|
|
|
for (i = 0; i < sits_in_cursum(journal); i++) {
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
unsigned int segno;
|
|
|
|
bool dirtied;
|
|
|
|
|
2016-02-14 18:50:40 +08:00
|
|
|
segno = le32_to_cpu(segno_in_journal(journal, i));
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
dirtied = __mark_sit_entry_dirty(sbi, segno);
|
|
|
|
|
|
|
|
if (!dirtied)
|
|
|
|
add_sit_entry(segno, &SM_I(sbi)->sit_entry_set);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
2016-02-14 18:50:40 +08:00
|
|
|
update_sits_in_cursum(journal, -i);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
up_write(&curseg->journal_rwsem);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* CP calls this function, which flushes SIT entries including sit_journal,
|
|
|
|
* and moves prefree segs to free segs.
|
|
|
|
*/
|
2014-09-21 13:06:39 +08:00
|
|
|
void flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
unsigned long *bitmap = sit_i->dirty_sentries_bitmap;
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
struct f2fs_journal *journal = curseg->journal;
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
struct sit_entry_set *ses, *tmp;
|
|
|
|
struct list_head *head = &SM_I(sbi)->sit_entry_set;
|
|
|
|
bool to_journal = true;
|
2014-09-21 13:06:39 +08:00
|
|
|
struct seg_entry *se;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2015-02-27 16:52:50 +08:00
|
|
|
if (!sit_i->dirty_sentries)
|
|
|
|
goto out;
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/*
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
* add and account sit entries of dirty bitmap in sit entry
|
|
|
|
* set temporarily
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
*/
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
add_sits_in_set(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
/*
|
|
|
|
* if there are no enough space in journal to store dirty sit
|
|
|
|
* entries, remove all entries from journal and add and account
|
|
|
|
* them in sit entry set.
|
|
|
|
*/
|
2016-02-14 18:50:40 +08:00
|
|
|
if (!__has_cursum_space(journal, sit_i->dirty_sentries, SIT_JOURNAL))
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
remove_sits_in_journal(sbi);
|
2013-11-12 13:49:56 +08:00
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
/*
|
|
|
|
* there are two steps to flush sit entries:
|
|
|
|
* #1, flush sit entries to journal in current cold data summary block.
|
|
|
|
* #2, flush sit entries to sit page.
|
|
|
|
*/
|
|
|
|
list_for_each_entry_safe(ses, tmp, head, set_list) {
|
2014-10-17 02:43:30 +08:00
|
|
|
struct page *page = NULL;
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
struct f2fs_sit_block *raw_sit = NULL;
|
|
|
|
unsigned int start_segno = ses->start_segno;
|
|
|
|
unsigned int end = min(start_segno + SIT_ENTRY_PER_BLOCK,
|
2014-09-24 02:23:01 +08:00
|
|
|
(unsigned long)MAIN_SEGS(sbi));
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
unsigned int segno = start_segno;
|
|
|
|
|
|
|
|
if (to_journal &&
|
2016-02-14 18:50:40 +08:00
|
|
|
!__has_cursum_space(journal, ses->entry_cnt, SIT_JOURNAL))
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
to_journal = false;
|
|
|
|
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
if (to_journal) {
|
|
|
|
down_write(&curseg->journal_rwsem);
|
|
|
|
} else {
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
page = get_next_sit_page(sbi, start_segno);
|
|
|
|
raw_sit = page_address(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
/* flush dirty sit entries in region of current sit set */
|
|
|
|
for_each_set_bit_from(segno, bitmap, end) {
|
|
|
|
int offset, sit_offset;
|
2014-09-21 13:06:39 +08:00
|
|
|
|
|
|
|
se = get_seg_entry(sbi, segno);
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
|
|
|
/* add discard candidates */
|
2017-04-27 20:40:39 +08:00
|
|
|
if (!(cpc->reason & CP_DISCARD)) {
|
2014-09-21 13:06:39 +08:00
|
|
|
cpc->trim_start = segno;
|
2016-12-30 14:06:15 +08:00
|
|
|
add_discard_addrs(sbi, cpc, false);
|
2014-09-21 13:06:39 +08:00
|
|
|
}
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
|
|
|
if (to_journal) {
|
2016-02-14 18:50:40 +08:00
|
|
|
offset = lookup_journal_in_cursum(journal,
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
SIT_JOURNAL, segno, 1);
|
|
|
|
f2fs_bug_on(sbi, offset < 0);
|
2016-02-14 18:50:40 +08:00
|
|
|
segno_in_journal(journal, offset) =
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
cpu_to_le32(segno);
|
|
|
|
seg_info_to_raw_sit(se,
|
2016-02-14 18:50:40 +08:00
|
|
|
&sit_in_journal(journal, offset));
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
} else {
|
|
|
|
sit_offset = SIT_ENTRY_OFFSET(sit_i, segno);
|
|
|
|
seg_info_to_raw_sit(se,
|
|
|
|
&raw_sit->entries[sit_offset]);
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
__clear_bit(segno, bitmap);
|
|
|
|
sit_i->dirty_sentries--;
|
|
|
|
ses->entry_cnt--;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
if (to_journal)
|
|
|
|
up_write(&curseg->journal_rwsem);
|
|
|
|
else
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
|
|
|
|
f2fs_bug_on(sbi, ses->entry_cnt);
|
|
|
|
release_sit_entry_set(ses);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
|
|
|
f2fs_bug_on(sbi, !list_empty(head));
|
|
|
|
f2fs_bug_on(sbi, sit_i->dirty_sentries);
|
|
|
|
out:
|
2017-04-27 20:40:39 +08:00
|
|
|
if (cpc->reason & CP_DISCARD) {
|
2016-12-22 11:46:24 +08:00
|
|
|
__u64 trim_start = cpc->trim_start;
|
|
|
|
|
2014-09-21 13:06:39 +08:00
|
|
|
for (; cpc->trim_start <= cpc->trim_end; cpc->trim_start++)
|
2016-12-30 14:06:15 +08:00
|
|
|
add_discard_addrs(sbi, cpc, false);
|
2016-12-22 11:46:24 +08:00
|
|
|
|
|
|
|
cpc->trim_start = trim_start;
|
2014-09-21 13:06:39 +08:00
|
|
|
}
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
set_prefree_as_free_segments(sbi);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int build_sit_info(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
|
|
|
|
struct sit_info *sit_i;
|
|
|
|
unsigned int sit_segs, start;
|
2017-01-07 18:52:34 +08:00
|
|
|
char *src_bitmap;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
unsigned int bitmap_size;
|
|
|
|
|
|
|
|
/* allocate memory for SIT information */
|
2017-11-30 19:28:17 +08:00
|
|
|
sit_i = f2fs_kzalloc(sbi, sizeof(struct sit_info), GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sit_i)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
SM_I(sbi)->sit_info = sit_i;
|
|
|
|
|
2017-11-30 19:28:18 +08:00
|
|
|
sit_i->sentries = f2fs_kvzalloc(sbi, MAIN_SEGS(sbi) *
|
2015-09-23 04:50:47 +08:00
|
|
|
sizeof(struct seg_entry), GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sit_i->sentries)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
|
2017-11-30 19:28:18 +08:00
|
|
|
sit_i->dirty_sentries_bitmap = f2fs_kvzalloc(sbi, bitmap_size,
|
|
|
|
GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sit_i->dirty_sentries_bitmap)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
for (start = 0; start < MAIN_SEGS(sbi); start++) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
sit_i->sentries[start].cur_valid_map
|
2017-11-30 19:28:17 +08:00
|
|
|
= f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
sit_i->sentries[start].ckpt_valid_map
|
2017-11-30 19:28:17 +08:00
|
|
|
= f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
|
2015-05-01 13:37:50 +08:00
|
|
|
if (!sit_i->sentries[start].cur_valid_map ||
|
2016-08-03 01:56:40 +08:00
|
|
|
!sit_i->sentries[start].ckpt_valid_map)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return -ENOMEM;
|
2016-08-03 01:56:40 +08:00
|
|
|
|
2017-01-07 18:51:01 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
sit_i->sentries[start].cur_valid_map_mir
|
2017-11-30 19:28:17 +08:00
|
|
|
= f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
|
2017-01-07 18:51:01 +08:00
|
|
|
if (!sit_i->sentries[start].cur_valid_map_mir)
|
|
|
|
return -ENOMEM;
|
|
|
|
#endif
|
|
|
|
|
2016-08-03 01:56:40 +08:00
|
|
|
if (f2fs_discard_en(sbi)) {
|
|
|
|
sit_i->sentries[start].discard_map
|
2017-11-30 19:28:17 +08:00
|
|
|
= f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE,
|
|
|
|
GFP_KERNEL);
|
2016-08-03 01:56:40 +08:00
|
|
|
if (!sit_i->sentries[start].discard_map)
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2017-11-30 19:28:17 +08:00
|
|
|
sit_i->tmp_map = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
|
2015-02-11 08:44:29 +08:00
|
|
|
if (!sit_i->tmp_map)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (sbi->segs_per_sec > 1) {
|
2017-11-30 19:28:18 +08:00
|
|
|
sit_i->sec_entries = f2fs_kvzalloc(sbi, MAIN_SECS(sbi) *
|
2015-09-23 04:50:47 +08:00
|
|
|
sizeof(struct sec_entry), GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sit_i->sec_entries)
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* get information related with SIT */
|
|
|
|
sit_segs = le32_to_cpu(raw_super->segment_count_sit) >> 1;
|
|
|
|
|
|
|
|
/* setup SIT bitmap from ckeckpoint pack */
|
|
|
|
bitmap_size = __bitmap_size(sbi, SIT_BITMAP);
|
|
|
|
src_bitmap = __bitmap_ptr(sbi, SIT_BITMAP);
|
|
|
|
|
2017-01-07 18:52:34 +08:00
|
|
|
sit_i->sit_bitmap = kmemdup(src_bitmap, bitmap_size, GFP_KERNEL);
|
|
|
|
if (!sit_i->sit_bitmap)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
|
2017-01-07 18:52:34 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
sit_i->sit_bitmap_mir = kmemdup(src_bitmap, bitmap_size, GFP_KERNEL);
|
|
|
|
if (!sit_i->sit_bitmap_mir)
|
|
|
|
return -ENOMEM;
|
|
|
|
#endif
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/* init SIT information */
|
|
|
|
sit_i->s_ops = &default_salloc_ops;
|
|
|
|
|
|
|
|
sit_i->sit_base_addr = le32_to_cpu(raw_super->sit_blkaddr);
|
|
|
|
sit_i->sit_blocks = sit_segs << sbi->log_blocks_per_seg;
|
2016-11-15 10:20:10 +08:00
|
|
|
sit_i->written_valid_blocks = 0;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
sit_i->bitmap_size = bitmap_size;
|
|
|
|
sit_i->dirty_sentries = 0;
|
|
|
|
sit_i->sents_per_block = SIT_ENTRY_PER_BLOCK;
|
|
|
|
sit_i->elapsed_time = le64_to_cpu(sbi->ckpt->elapsed_time);
|
2017-05-09 06:59:10 +08:00
|
|
|
sit_i->mounted_time = ktime_get_real_seconds();
|
2017-10-30 17:49:53 +08:00
|
|
|
init_rwsem(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int build_free_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct free_segmap_info *free_i;
|
|
|
|
unsigned int bitmap_size, sec_bitmap_size;
|
|
|
|
|
|
|
|
/* allocate memory for free segmap information */
|
2017-11-30 19:28:17 +08:00
|
|
|
free_i = f2fs_kzalloc(sbi, sizeof(struct free_segmap_info), GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!free_i)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
SM_I(sbi)->free_info = free_i;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
|
2017-11-30 19:28:18 +08:00
|
|
|
free_i->free_segmap = f2fs_kvmalloc(sbi, bitmap_size, GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!free_i->free_segmap)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
sec_bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
|
2017-11-30 19:28:18 +08:00
|
|
|
free_i->free_secmap = f2fs_kvmalloc(sbi, sec_bitmap_size, GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!free_i->free_secmap)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
/* set all segments as dirty temporarily */
|
|
|
|
memset(free_i->free_segmap, 0xff, bitmap_size);
|
|
|
|
memset(free_i->free_secmap, 0xff, sec_bitmap_size);
|
|
|
|
|
|
|
|
/* init free segmap information */
|
2014-09-24 02:23:01 +08:00
|
|
|
free_i->start_segno = GET_SEGNO_FROM_SEG0(sbi, MAIN_BLKADDR(sbi));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
free_i->free_segments = 0;
|
|
|
|
free_i->free_sections = 0;
|
2015-02-11 18:20:38 +08:00
|
|
|
spin_lock_init(&free_i->segmap_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int build_curseg(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2012-12-01 09:56:13 +08:00
|
|
|
struct curseg_info *array;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
int i;
|
|
|
|
|
2017-11-30 19:28:19 +08:00
|
|
|
array = f2fs_kzalloc(sbi, sizeof(*array) * NR_CURSEG_TYPE, GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!array)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
SM_I(sbi)->curseg_array = array;
|
|
|
|
|
|
|
|
for (i = 0; i < NR_CURSEG_TYPE; i++) {
|
|
|
|
mutex_init(&array[i].curseg_mutex);
|
2017-11-30 19:28:17 +08:00
|
|
|
array[i].sum_blk = f2fs_kzalloc(sbi, PAGE_SIZE, GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!array[i].sum_blk)
|
|
|
|
return -ENOMEM;
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
init_rwsem(&array[i].journal_rwsem);
|
2017-11-30 19:28:17 +08:00
|
|
|
array[i].journal = f2fs_kzalloc(sbi,
|
|
|
|
sizeof(struct f2fs_journal), GFP_KERNEL);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
if (!array[i].journal)
|
|
|
|
return -ENOMEM;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
array[i].segno = NULL_SEGNO;
|
|
|
|
array[i].next_blkoff = 0;
|
|
|
|
}
|
|
|
|
return restore_curseg_summaries(sbi);
|
|
|
|
}
|
|
|
|
|
2017-12-20 11:16:34 +08:00
|
|
|
static int build_sit_entries(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
struct f2fs_journal *journal = curseg->journal;
|
2016-09-24 12:29:18 +08:00
|
|
|
struct seg_entry *se;
|
|
|
|
struct f2fs_sit_entry sit;
|
2013-11-22 09:09:59 +08:00
|
|
|
int sit_blk_cnt = SIT_BLK_CNT(sbi);
|
|
|
|
unsigned int i, start, end;
|
|
|
|
unsigned int readed, start_blk = 0;
|
2017-12-20 11:16:34 +08:00
|
|
|
int err = 0;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2013-11-22 09:09:59 +08:00
|
|
|
do {
|
2016-10-19 02:07:45 +08:00
|
|
|
readed = ra_meta_pages(sbi, start_blk, BIO_MAX_PAGES,
|
|
|
|
META_SIT, true);
|
2013-11-22 09:09:59 +08:00
|
|
|
|
|
|
|
start = start_blk * sit_i->sents_per_block;
|
|
|
|
end = (start_blk + readed) * sit_i->sents_per_block;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
for (; start < end && start < MAIN_SEGS(sbi); start++) {
|
2013-11-22 09:09:59 +08:00
|
|
|
struct f2fs_sit_block *sit_blk;
|
|
|
|
struct page *page;
|
|
|
|
|
2016-09-24 12:29:18 +08:00
|
|
|
se = &sit_i->sentries[start];
|
2013-11-22 09:09:59 +08:00
|
|
|
page = get_current_sit_page(sbi, start);
|
|
|
|
sit_blk = (struct f2fs_sit_block *)page_address(page);
|
|
|
|
sit = sit_blk->entries[SIT_ENTRY_OFFSET(sit_i, start)];
|
|
|
|
f2fs_put_page(page, 1);
|
2016-08-19 23:13:47 +08:00
|
|
|
|
2017-12-20 11:16:34 +08:00
|
|
|
err = check_block_count(sbi, start, &sit);
|
|
|
|
if (err)
|
|
|
|
return err;
|
2013-11-22 09:09:59 +08:00
|
|
|
seg_info_from_raw_sit(se, &sit);
|
2015-05-01 13:37:50 +08:00
|
|
|
|
|
|
|
/* build discard map only one time */
|
2016-08-03 01:56:40 +08:00
|
|
|
if (f2fs_discard_en(sbi)) {
|
2017-04-28 13:56:08 +08:00
|
|
|
if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
|
|
|
|
memset(se->discard_map, 0xff,
|
|
|
|
SIT_VBLOCK_MAP_SIZE);
|
|
|
|
} else {
|
|
|
|
memcpy(se->discard_map,
|
|
|
|
se->cur_valid_map,
|
|
|
|
SIT_VBLOCK_MAP_SIZE);
|
|
|
|
sbi->discard_blks +=
|
|
|
|
sbi->blocks_per_seg -
|
|
|
|
se->valid_blocks;
|
|
|
|
}
|
2016-08-03 01:56:40 +08:00
|
|
|
}
|
2015-05-01 13:37:50 +08:00
|
|
|
|
2016-08-19 23:13:47 +08:00
|
|
|
if (sbi->segs_per_sec > 1)
|
|
|
|
get_sec_entry(sbi, start)->valid_blocks +=
|
|
|
|
se->valid_blocks;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
2013-11-22 09:09:59 +08:00
|
|
|
start_blk += readed;
|
|
|
|
} while (start_blk < sit_blk_cnt);
|
2016-08-19 23:13:47 +08:00
|
|
|
|
|
|
|
down_read(&curseg->journal_rwsem);
|
|
|
|
for (i = 0; i < sits_in_cursum(journal); i++) {
|
|
|
|
unsigned int old_valid_blocks;
|
|
|
|
|
|
|
|
start = le32_to_cpu(segno_in_journal(journal, i));
|
|
|
|
se = &sit_i->sentries[start];
|
|
|
|
sit = sit_in_journal(journal, i);
|
|
|
|
|
|
|
|
old_valid_blocks = se->valid_blocks;
|
|
|
|
|
2017-12-20 11:16:34 +08:00
|
|
|
err = check_block_count(sbi, start, &sit);
|
|
|
|
if (err)
|
|
|
|
break;
|
2016-08-19 23:13:47 +08:00
|
|
|
seg_info_from_raw_sit(se, &sit);
|
|
|
|
|
|
|
|
if (f2fs_discard_en(sbi)) {
|
2017-04-28 13:56:08 +08:00
|
|
|
if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
|
|
|
|
memset(se->discard_map, 0xff,
|
|
|
|
SIT_VBLOCK_MAP_SIZE);
|
|
|
|
} else {
|
|
|
|
memcpy(se->discard_map, se->cur_valid_map,
|
|
|
|
SIT_VBLOCK_MAP_SIZE);
|
|
|
|
sbi->discard_blks += old_valid_blocks -
|
|
|
|
se->valid_blocks;
|
|
|
|
}
|
2016-08-19 23:13:47 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (sbi->segs_per_sec > 1)
|
|
|
|
get_sec_entry(sbi, start)->valid_blocks +=
|
|
|
|
se->valid_blocks - old_valid_blocks;
|
|
|
|
}
|
|
|
|
up_read(&curseg->journal_rwsem);
|
2017-12-20 11:16:34 +08:00
|
|
|
return err;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void init_free_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
unsigned int start;
|
|
|
|
int type;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
for (start = 0; start < MAIN_SEGS(sbi); start++) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
struct seg_entry *sentry = get_seg_entry(sbi, start);
|
|
|
|
if (!sentry->valid_blocks)
|
|
|
|
__set_free(sbi, start);
|
2016-11-15 10:20:10 +08:00
|
|
|
else
|
|
|
|
SIT_I(sbi)->written_valid_blocks +=
|
|
|
|
sentry->valid_blocks;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* set use the current segments */
|
|
|
|
for (type = CURSEG_HOT_DATA; type <= CURSEG_COLD_NODE; type++) {
|
|
|
|
struct curseg_info *curseg_t = CURSEG_I(sbi, type);
|
|
|
|
__set_test_and_inuse(sbi, curseg_t->segno);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void init_dirty_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
struct free_segmap_info *free_i = FREE_I(sbi);
|
2014-09-24 02:23:01 +08:00
|
|
|
unsigned int segno = 0, offset = 0;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
unsigned short valid_blocks;
|
|
|
|
|
2013-06-16 08:49:11 +08:00
|
|
|
while (1) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/* find dirty segment based on free segmap */
|
2014-09-24 02:23:01 +08:00
|
|
|
segno = find_next_inuse(free_i, MAIN_SEGS(sbi), offset);
|
|
|
|
if (segno >= MAIN_SEGS(sbi))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
break;
|
|
|
|
offset = segno + 1;
|
2017-04-08 05:33:22 +08:00
|
|
|
valid_blocks = get_valid_blocks(sbi, segno, false);
|
2014-09-03 07:24:11 +08:00
|
|
|
if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
continue;
|
2014-09-03 07:24:11 +08:00
|
|
|
if (valid_blocks > sbi->blocks_per_seg) {
|
|
|
|
f2fs_bug_on(sbi, 1);
|
|
|
|
continue;
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
|
|
|
__locate_dirty_segment(sbi, segno, DIRTY);
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-03-31 12:26:03 +08:00
|
|
|
static int init_victim_secmap(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2014-09-24 02:23:01 +08:00
|
|
|
unsigned int bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2017-11-30 19:28:18 +08:00
|
|
|
dirty_i->victim_secmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL);
|
2013-03-31 12:26:03 +08:00
|
|
|
if (!dirty_i->victim_secmap)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int build_dirty_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i;
|
|
|
|
unsigned int bitmap_size, i;
|
|
|
|
|
|
|
|
/* allocate memory for dirty segments list information */
|
2017-11-30 19:28:17 +08:00
|
|
|
dirty_i = f2fs_kzalloc(sbi, sizeof(struct dirty_seglist_info),
|
|
|
|
GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!dirty_i)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
SM_I(sbi)->dirty_info = dirty_i;
|
|
|
|
mutex_init(&dirty_i->seglist_lock);
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
for (i = 0; i < NR_DIRTY_TYPE; i++) {
|
2017-11-30 19:28:18 +08:00
|
|
|
dirty_i->dirty_segmap[i] = f2fs_kvzalloc(sbi, bitmap_size,
|
|
|
|
GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!dirty_i->dirty_segmap[i])
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
init_dirty_segmap(sbi);
|
2013-03-31 12:26:03 +08:00
|
|
|
return init_victim_secmap(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Update min, max modified time for cost-benefit GC algorithm
|
|
|
|
*/
|
|
|
|
static void init_min_max_mtime(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
unsigned int segno;
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
sit_i->min_mtime = LLONG_MAX;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
for (segno = 0; segno < MAIN_SEGS(sbi); segno += sbi->segs_per_sec) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
unsigned int i;
|
|
|
|
unsigned long long mtime = 0;
|
|
|
|
|
|
|
|
for (i = 0; i < sbi->segs_per_sec; i++)
|
|
|
|
mtime += get_seg_entry(sbi, segno + i)->mtime;
|
|
|
|
|
|
|
|
mtime = div_u64(mtime, sbi->segs_per_sec);
|
|
|
|
|
|
|
|
if (sit_i->min_mtime > mtime)
|
|
|
|
sit_i->min_mtime = mtime;
|
|
|
|
}
|
|
|
|
sit_i->max_mtime = get_mtime(sbi);
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int build_segment_manager(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
|
|
|
|
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
|
2012-12-01 09:56:13 +08:00
|
|
|
struct f2fs_sm_info *sm_info;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
int err;
|
|
|
|
|
2017-11-30 19:28:17 +08:00
|
|
|
sm_info = f2fs_kzalloc(sbi, sizeof(struct f2fs_sm_info), GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sm_info)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
/* init sm info */
|
|
|
|
sbi->sm_info = sm_info;
|
|
|
|
sm_info->seg0_blkaddr = le32_to_cpu(raw_super->segment0_blkaddr);
|
|
|
|
sm_info->main_blkaddr = le32_to_cpu(raw_super->main_blkaddr);
|
|
|
|
sm_info->segment_count = le32_to_cpu(raw_super->segment_count);
|
|
|
|
sm_info->reserved_segments = le32_to_cpu(ckpt->rsvd_segment_count);
|
|
|
|
sm_info->ovp_segments = le32_to_cpu(ckpt->overprov_segment_count);
|
|
|
|
sm_info->main_segments = le32_to_cpu(raw_super->segment_count_main);
|
|
|
|
sm_info->ssa_blkaddr = le32_to_cpu(raw_super->ssa_blkaddr);
|
2014-03-19 13:17:21 +08:00
|
|
|
sm_info->rec_prefree_segments = sm_info->main_segments *
|
|
|
|
DEF_RECLAIM_PREFREE_SEGMENTS / 100;
|
2016-07-14 09:23:35 +08:00
|
|
|
if (sm_info->rec_prefree_segments > DEF_MAX_RECLAIM_PREFREE_SEGMENTS)
|
|
|
|
sm_info->rec_prefree_segments = DEF_MAX_RECLAIM_PREFREE_SEGMENTS;
|
|
|
|
|
2016-06-14 00:47:48 +08:00
|
|
|
if (!test_opt(sbi, LFS))
|
|
|
|
sm_info->ipu_policy = 1 << F2FS_IPU_FSYNC;
|
2013-11-07 12:13:42 +08:00
|
|
|
sm_info->min_ipu_util = DEF_MIN_IPU_UTIL;
|
2014-09-11 07:53:02 +08:00
|
|
|
sm_info->min_fsync_blocks = DEF_MIN_FSYNC_BLOCKS;
|
2017-03-25 08:05:13 +08:00
|
|
|
sm_info->min_hot_blocks = DEF_MIN_HOT_BLOCKS;
|
2017-10-28 16:52:33 +08:00
|
|
|
sm_info->min_ssr_sections = reserved_sections(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2015-01-27 09:41:23 +08:00
|
|
|
sm_info->trim_sections = DEF_BATCHED_TRIM_SECTIONS;
|
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
INIT_LIST_HEAD(&sm_info->sit_entry_set);
|
|
|
|
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
init_rwsem(&sm_info->curseg_lock);
|
|
|
|
|
2017-06-01 16:43:51 +08:00
|
|
|
if (!f2fs_readonly(sbi->sb)) {
|
2014-04-27 14:21:33 +08:00
|
|
|
err = create_flush_cmd_control(sbi);
|
|
|
|
if (err)
|
2014-04-27 14:21:21 +08:00
|
|
|
return err;
|
2014-04-02 14:34:36 +08:00
|
|
|
}
|
|
|
|
|
2017-01-12 06:40:24 +08:00
|
|
|
err = create_discard_cmd_control(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
err = build_sit_info(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
err = build_free_segmap(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
err = build_curseg(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
/* reinit free segmap based on SIT */
|
2017-12-20 11:16:34 +08:00
|
|
|
err = build_sit_entries(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
init_free_segmap(sbi);
|
|
|
|
err = build_dirty_segmap(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
init_min_max_mtime(sbi);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void discard_dirty_segmap(struct f2fs_sb_info *sbi,
|
|
|
|
enum dirty_type dirty_type)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
2015-09-23 04:50:47 +08:00
|
|
|
kvfree(dirty_i->dirty_segmap[dirty_type]);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
dirty_i->nr_dirty[dirty_type] = 0;
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
}
|
|
|
|
|
2013-03-31 12:26:03 +08:00
|
|
|
static void destroy_victim_secmap(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2015-09-23 04:50:47 +08:00
|
|
|
kvfree(dirty_i->victim_secmap);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void destroy_dirty_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!dirty_i)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* discard pre-free/dirty segments list */
|
|
|
|
for (i = 0; i < NR_DIRTY_TYPE; i++)
|
|
|
|
discard_dirty_segmap(sbi, i);
|
|
|
|
|
2013-03-31 12:26:03 +08:00
|
|
|
destroy_victim_secmap(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
SM_I(sbi)->dirty_info = NULL;
|
|
|
|
kfree(dirty_i);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void destroy_curseg(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct curseg_info *array = SM_I(sbi)->curseg_array;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!array)
|
|
|
|
return;
|
|
|
|
SM_I(sbi)->curseg_array = NULL;
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
for (i = 0; i < NR_CURSEG_TYPE; i++) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kfree(array[i].sum_blk);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
kfree(array[i].journal);
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kfree(array);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void destroy_free_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct free_segmap_info *free_i = SM_I(sbi)->free_info;
|
|
|
|
if (!free_i)
|
|
|
|
return;
|
|
|
|
SM_I(sbi)->free_info = NULL;
|
2015-09-23 04:50:47 +08:00
|
|
|
kvfree(free_i->free_segmap);
|
|
|
|
kvfree(free_i->free_secmap);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kfree(free_i);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void destroy_sit_info(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
unsigned int start;
|
|
|
|
|
|
|
|
if (!sit_i)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (sit_i->sentries) {
|
2014-09-24 02:23:01 +08:00
|
|
|
for (start = 0; start < MAIN_SEGS(sbi); start++) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kfree(sit_i->sentries[start].cur_valid_map);
|
2017-01-07 18:51:01 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
kfree(sit_i->sentries[start].cur_valid_map_mir);
|
|
|
|
#endif
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kfree(sit_i->sentries[start].ckpt_valid_map);
|
2015-05-01 13:37:50 +08:00
|
|
|
kfree(sit_i->sentries[start].discard_map);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
2015-02-11 08:44:29 +08:00
|
|
|
kfree(sit_i->tmp_map);
|
|
|
|
|
2015-09-23 04:50:47 +08:00
|
|
|
kvfree(sit_i->sentries);
|
|
|
|
kvfree(sit_i->sec_entries);
|
|
|
|
kvfree(sit_i->dirty_sentries_bitmap);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
SM_I(sbi)->sit_info = NULL;
|
|
|
|
kfree(sit_i->sit_bitmap);
|
2017-01-07 18:52:34 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
kfree(sit_i->sit_bitmap_mir);
|
|
|
|
#endif
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kfree(sit_i);
|
|
|
|
}
|
|
|
|
|
|
|
|
void destroy_segment_manager(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct f2fs_sm_info *sm_info = SM_I(sbi);
|
2014-04-27 14:21:21 +08:00
|
|
|
|
2013-11-06 09:12:04 +08:00
|
|
|
if (!sm_info)
|
|
|
|
return;
|
2016-12-08 08:23:32 +08:00
|
|
|
destroy_flush_cmd_control(sbi, true);
|
2017-03-27 18:14:04 +08:00
|
|
|
destroy_discard_cmd_control(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
destroy_dirty_segmap(sbi);
|
|
|
|
destroy_curseg(sbi);
|
|
|
|
destroy_free_segmap(sbi);
|
|
|
|
destroy_sit_info(sbi);
|
|
|
|
sbi->sm_info = NULL;
|
|
|
|
kfree(sm_info);
|
|
|
|
}
|
2013-11-15 12:55:58 +08:00
|
|
|
|
|
|
|
int __init create_segment_manager_caches(void)
|
|
|
|
{
|
|
|
|
discard_entry_slab = f2fs_kmem_cache_create("discard_entry",
|
2014-03-07 18:43:28 +08:00
|
|
|
sizeof(struct discard_entry));
|
2013-11-15 12:55:58 +08:00
|
|
|
if (!discard_entry_slab)
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
goto fail;
|
|
|
|
|
2017-01-10 06:13:03 +08:00
|
|
|
discard_cmd_slab = f2fs_kmem_cache_create("discard_cmd",
|
|
|
|
sizeof(struct discard_cmd));
|
|
|
|
if (!discard_cmd_slab)
|
2016-09-05 12:28:26 +08:00
|
|
|
goto destroy_discard_entry;
|
2016-08-29 23:58:34 +08:00
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
sit_entry_set_slab = f2fs_kmem_cache_create("sit_entry_set",
|
2014-11-21 14:42:07 +08:00
|
|
|
sizeof(struct sit_entry_set));
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
if (!sit_entry_set_slab)
|
2017-01-10 06:13:03 +08:00
|
|
|
goto destroy_discard_cmd;
|
2014-10-07 08:39:50 +08:00
|
|
|
|
|
|
|
inmem_entry_slab = f2fs_kmem_cache_create("inmem_page_entry",
|
|
|
|
sizeof(struct inmem_pages));
|
|
|
|
if (!inmem_entry_slab)
|
|
|
|
goto destroy_sit_entry_set;
|
2013-11-15 12:55:58 +08:00
|
|
|
return 0;
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
2014-10-07 08:39:50 +08:00
|
|
|
destroy_sit_entry_set:
|
|
|
|
kmem_cache_destroy(sit_entry_set_slab);
|
2017-01-10 06:13:03 +08:00
|
|
|
destroy_discard_cmd:
|
|
|
|
kmem_cache_destroy(discard_cmd_slab);
|
2016-09-05 12:28:26 +08:00
|
|
|
destroy_discard_entry:
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
kmem_cache_destroy(discard_entry_slab);
|
|
|
|
fail:
|
|
|
|
return -ENOMEM;
|
2013-11-15 12:55:58 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void destroy_segment_manager_caches(void)
|
|
|
|
{
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
kmem_cache_destroy(sit_entry_set_slab);
|
2017-01-10 06:13:03 +08:00
|
|
|
kmem_cache_destroy(discard_cmd_slab);
|
2013-11-15 12:55:58 +08:00
|
|
|
kmem_cache_destroy(discard_entry_slab);
|
2014-10-07 08:39:50 +08:00
|
|
|
kmem_cache_destroy(inmem_entry_slab);
|
2013-11-15 12:55:58 +08:00
|
|
|
}
|