2020-04-25 21:19:08 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
* fs/f2fs/f2fs.h
|
|
|
|
*
|
|
|
|
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
|
|
|
|
* http://www.samsung.com/
|
|
|
|
*/
|
|
|
|
#ifndef _LINUX_F2FS_H
|
|
|
|
#define _LINUX_F2FS_H
|
|
|
|
|
2018-09-27 18:34:52 +08:00
|
|
|
#include <linux/uio.h>
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/page-flags.h>
|
|
|
|
#include <linux/buffer_head.h>
|
|
|
|
#include <linux/slab.h>
|
|
|
|
#include <linux/crc32.h>
|
|
|
|
#include <linux/magic.h>
|
2013-08-08 14:56:49 +08:00
|
|
|
#include <linux/kobject.h>
|
2013-10-22 14:52:26 +08:00
|
|
|
#include <linux/sched.h>
|
2018-01-05 13:36:09 +08:00
|
|
|
#include <linux/cred.h>
|
2015-09-23 04:50:47 +08:00
|
|
|
#include <linux/vmalloc.h>
|
2015-08-15 02:43:56 +08:00
|
|
|
#include <linux/bio.h>
|
2016-01-09 08:57:48 +08:00
|
|
|
#include <linux/blkdev.h>
|
2017-07-09 00:13:07 +08:00
|
|
|
#include <linux/quotaops.h>
|
2020-03-25 23:48:42 +08:00
|
|
|
#include <linux/part_stat.h>
|
2016-03-03 04:04:24 +08:00
|
|
|
#include <crypto/hash.h>
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2017-10-10 03:15:34 +08:00
|
|
|
#include <linux/fscrypt.h>
|
f2fs: add fs-verity support
Add fs-verity support to f2fs. fs-verity is a filesystem feature that
enables transparent integrity protection and authentication of read-only
files. It uses a dm-verity like mechanism at the file level: a Merkle
tree is used to verify any block in the file in log(filesize) time. It
is implemented mainly by helper functions in fs/verity/. See
Documentation/filesystems/fsverity.rst for the full documentation.
The f2fs support for fs-verity consists of:
- Adding a filesystem feature flag and an inode flag for fs-verity.
- Implementing the fsverity_operations to support enabling verity on an
inode and reading/writing the verity metadata.
- Updating ->readpages() to verify data as it's read from verity files
and to support reading verity metadata pages.
- Updating ->write_begin(), ->write_end(), and ->writepages() to support
writing verity metadata pages.
- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
Like ext4, f2fs stores the verity metadata (Merkle tree and
fsverity_descriptor) past the end of the file, starting at the first 64K
boundary beyond i_size. This approach works because (a) verity files
are readonly, and (b) pages fully beyond i_size aren't visible to
userspace but can be read/written internally by f2fs with only some
relatively small changes to f2fs. Extended attributes cannot be used
because (a) f2fs limits the total size of an inode's xattr entries to
4096 bytes, which wouldn't be enough for even a single Merkle tree
block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity
metadata *must* be encrypted when the file is because it contains hashes
of the plaintext data.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-07-23 00:26:24 +08:00
|
|
|
#include <linux/fsverity.h>
|
2017-10-10 03:15:34 +08:00
|
|
|
|
2021-12-08 03:15:07 +08:00
|
|
|
struct pagevec;
|
|
|
|
|
2013-10-29 14:14:54 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
2014-09-03 06:52:58 +08:00
|
|
|
#define f2fs_bug_on(sbi, condition) BUG_ON(condition)
|
2013-10-29 14:14:54 +08:00
|
|
|
#else
|
2014-09-03 06:52:58 +08:00
|
|
|
#define f2fs_bug_on(sbi, condition) \
|
|
|
|
do { \
|
2020-12-01 15:17:39 +08:00
|
|
|
if (WARN_ON(condition)) \
|
2015-01-28 17:48:42 +08:00
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK); \
|
2014-09-03 06:52:58 +08:00
|
|
|
} while (0)
|
2013-10-29 14:14:54 +08:00
|
|
|
#endif
|
|
|
|
|
2016-04-30 06:49:56 +08:00
|
|
|
enum {
|
|
|
|
FAULT_KMALLOC,
|
2017-11-30 19:28:18 +08:00
|
|
|
FAULT_KVMALLOC,
|
2016-04-30 07:17:09 +08:00
|
|
|
FAULT_PAGE_ALLOC,
|
2017-10-28 16:52:30 +08:00
|
|
|
FAULT_PAGE_GET,
|
2021-08-06 10:45:20 +08:00
|
|
|
FAULT_ALLOC_BIO, /* it's obsolete due to bio_alloc() will never fail */
|
2016-04-30 07:29:22 +08:00
|
|
|
FAULT_ALLOC_NID,
|
|
|
|
FAULT_ORPHAN,
|
|
|
|
FAULT_BLOCK,
|
|
|
|
FAULT_DIR_DEPTH,
|
2016-05-26 06:24:18 +08:00
|
|
|
FAULT_EVICT_INODE,
|
2017-03-10 07:24:24 +08:00
|
|
|
FAULT_TRUNCATE,
|
2018-09-12 09:22:29 +08:00
|
|
|
FAULT_READ_IO,
|
2016-09-26 19:45:55 +08:00
|
|
|
FAULT_CHECKPOINT,
|
2018-08-06 20:30:18 +08:00
|
|
|
FAULT_DISCARD,
|
2018-09-12 09:22:29 +08:00
|
|
|
FAULT_WRITE_IO,
|
2021-08-09 08:24:48 +08:00
|
|
|
FAULT_SLAB_ALLOC,
|
2021-10-28 21:03:05 +08:00
|
|
|
FAULT_DQUOT_INIT,
|
2021-12-12 17:17:51 +08:00
|
|
|
FAULT_LOCK_OP,
|
2016-04-30 06:49:56 +08:00
|
|
|
FAULT_MAX,
|
|
|
|
};
|
|
|
|
|
2018-08-14 05:38:06 +08:00
|
|
|
#ifdef CONFIG_F2FS_FAULT_INJECTION
|
2018-08-08 17:36:41 +08:00
|
|
|
#define F2FS_ALL_FAULT_TYPE ((1 << FAULT_MAX) - 1)
|
|
|
|
|
2016-05-16 12:38:50 +08:00
|
|
|
struct f2fs_fault_info {
|
|
|
|
atomic_t inject_ops;
|
|
|
|
unsigned int inject_rate;
|
|
|
|
unsigned int inject_type;
|
|
|
|
};
|
|
|
|
|
2018-11-24 17:06:42 +08:00
|
|
|
extern const char *f2fs_fault_name[FAULT_MAX];
|
2017-04-09 07:11:36 +08:00
|
|
|
#define IS_FAULT_SET(fi, type) ((fi)->inject_type & (1 << (type)))
|
2016-04-30 06:49:56 +08:00
|
|
|
#endif
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
/*
|
|
|
|
* For mount options
|
|
|
|
*/
|
|
|
|
#define F2FS_MOUNT_DISABLE_ROLL_FORWARD 0x00000002
|
|
|
|
#define F2FS_MOUNT_DISCARD 0x00000004
|
|
|
|
#define F2FS_MOUNT_NOHEAP 0x00000008
|
|
|
|
#define F2FS_MOUNT_XATTR_USER 0x00000010
|
|
|
|
#define F2FS_MOUNT_POSIX_ACL 0x00000020
|
|
|
|
#define F2FS_MOUNT_DISABLE_EXT_IDENTIFY 0x00000040
|
2013-08-08 14:16:22 +08:00
|
|
|
#define F2FS_MOUNT_INLINE_XATTR 0x00000080
|
2013-11-10 23:13:16 +08:00
|
|
|
#define F2FS_MOUNT_INLINE_DATA 0x00000100
|
2014-09-24 18:15:19 +08:00
|
|
|
#define F2FS_MOUNT_INLINE_DENTRY 0x00000200
|
|
|
|
#define F2FS_MOUNT_FLUSH_MERGE 0x00000400
|
|
|
|
#define F2FS_MOUNT_NOBARRIER 0x00000800
|
2014-10-31 13:47:03 +08:00
|
|
|
#define F2FS_MOUNT_FASTBOOT 0x00001000
|
2015-02-05 17:55:51 +08:00
|
|
|
#define F2FS_MOUNT_EXTENT_CACHE 0x00002000
|
2015-12-16 13:12:16 +08:00
|
|
|
#define F2FS_MOUNT_DATA_FLUSH 0x00008000
|
2016-04-30 06:34:32 +08:00
|
|
|
#define F2FS_MOUNT_FAULT_INJECTION 0x00010000
|
2017-07-09 00:13:07 +08:00
|
|
|
#define F2FS_MOUNT_USRQUOTA 0x00080000
|
|
|
|
#define F2FS_MOUNT_GRPQUOTA 0x00100000
|
2017-07-26 00:01:41 +08:00
|
|
|
#define F2FS_MOUNT_PRJQUOTA 0x00200000
|
2017-08-08 10:54:31 +08:00
|
|
|
#define F2FS_MOUNT_QUOTA 0x00400000
|
f2fs: support flexible inline xattr size
Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.
In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.
So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.
To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.
Inode disk layout:
+----------------------+
| .i_mode |
| ... |
| .i_ext |
+----------------------+
| .i_extra_isize |
| .i_inline_xattr_size |-----------+
| ... | |
+----------------------+ |
| .i_addr | |
| - block address or | |
| - inline data | |
+----------------------+<---+ v
| inline xattr | +---inline xattr range
+----------------------+<---+
| .i_nid |
+----------------------+
| node_footer |
| (nid, ino, offset) |
+----------------------+
Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.
Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-06 21:59:50 +08:00
|
|
|
#define F2FS_MOUNT_INLINE_XATTR_SIZE 0x00800000
|
2017-12-28 07:05:52 +08:00
|
|
|
#define F2FS_MOUNT_RESERVE_ROOT 0x01000000
|
2018-08-21 10:21:43 +08:00
|
|
|
#define F2FS_MOUNT_DISABLE_CHECKPOINT 0x02000000
|
2020-02-14 17:45:11 +08:00
|
|
|
#define F2FS_MOUNT_NORECOVERY 0x04000000
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
#define F2FS_MOUNT_ATGC 0x08000000
|
f2fs: introduce checkpoint_merge mount option
We've added a new mount options, "checkpoint_merge" and "nocheckpoint_merge",
which creates a kernel daemon and makes it to merge concurrent checkpoint
requests as much as possible to eliminate redundant checkpoint issues. Plus,
we can eliminate the sluggish issue caused by slow checkpoint operation
when the checkpoint is done in a process context in a cgroup having
low i/o budget and cpu shares. To make this do better, we set the
default i/o priority of the kernel daemon to "3", to give one higher
priority than other kernel threads. The below verification result
explains this.
The basic idea has come from https://opensource.samsung.com.
[Verification]
Android Pixel Device(ARM64, 7GB RAM, 256GB UFS)
Create two I/O cgroups (fg w/ weight 100, bg w/ wight 20)
Set "strict_guarantees" to "1" in BFQ tunables
In "fg" cgroup,
- thread A => trigger 1000 checkpoint operations
"for i in `seq 1 1000`; do touch test_dir1/file; fsync test_dir1;
done"
- thread B => gererating async. I/O
"fio --rw=write --numjobs=1 --bs=128k --runtime=3600 --time_based=1
--filename=test_img --name=test"
In "bg" cgroup,
- thread C => trigger repeated checkpoint operations
"echo $$ > /dev/blkio/bg/tasks; while true; do touch test_dir2/file;
fsync test_dir2; done"
We've measured thread A's execution time.
[ w/o patch ]
Elapsed Time: Avg. 68 seconds
[ w/ patch ]
Elapsed Time: Avg. 48 seconds
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
[Jaegeuk Kim: fix the return value in f2fs_start_ckpt_thread, reported by Dan]
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-19 08:00:42 +08:00
|
|
|
#define F2FS_MOUNT_MERGE_CHECKPOINT 0x10000000
|
2021-03-27 17:57:06 +08:00
|
|
|
#define F2FS_MOUNT_GC_MERGE 0x20000000
|
2021-05-20 19:51:50 +08:00
|
|
|
#define F2FS_MOUNT_COMPRESS_CACHE 0x40000000
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2018-03-08 14:22:56 +08:00
|
|
|
#define F2FS_OPTION(sbi) ((sbi)->mount_opt)
|
|
|
|
#define clear_opt(sbi, option) (F2FS_OPTION(sbi).opt &= ~F2FS_MOUNT_##option)
|
|
|
|
#define set_opt(sbi, option) (F2FS_OPTION(sbi).opt |= F2FS_MOUNT_##option)
|
|
|
|
#define test_opt(sbi, option) (F2FS_OPTION(sbi).opt & F2FS_MOUNT_##option)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
#define ver_after(a, b) (typecheck(unsigned long long, a) && \
|
|
|
|
typecheck(unsigned long long, b) && \
|
|
|
|
((long long)((a) - (b)) > 0))
|
|
|
|
|
2013-05-24 11:41:04 +08:00
|
|
|
typedef u32 block_t; /*
|
|
|
|
* should not change u32, since it is the on-disk block
|
|
|
|
* address format, __le32.
|
|
|
|
*/
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
typedef u32 nid_t;
|
|
|
|
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
#define COMPRESS_EXT_NUM 16
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct f2fs_mount_info {
|
2018-03-08 14:22:56 +08:00
|
|
|
unsigned int opt;
|
|
|
|
int write_io_size_bits; /* Write IO size bits */
|
|
|
|
block_t root_reserved_blocks; /* root reserved blocks */
|
|
|
|
kuid_t s_resuid; /* reserved blocks for uid */
|
|
|
|
kgid_t s_resgid; /* reserved blocks for gid */
|
|
|
|
int active_logs; /* # of active logs */
|
|
|
|
int inline_xattr_size; /* inline xattr size */
|
|
|
|
#ifdef CONFIG_F2FS_FAULT_INJECTION
|
|
|
|
struct f2fs_fault_info fault_info; /* For fault injection */
|
|
|
|
#endif
|
|
|
|
#ifdef CONFIG_QUOTA
|
|
|
|
/* Names of quota files with journalled quota */
|
|
|
|
char *s_qf_names[MAXQUOTAS];
|
|
|
|
int s_jquota_fmt; /* Format of quota to use */
|
|
|
|
#endif
|
|
|
|
/* For which write hints are passed down to block layer */
|
|
|
|
int whint_mode;
|
|
|
|
int alloc_mode; /* segment allocation policy */
|
|
|
|
int fsync_mode; /* fsync policy */
|
2020-02-14 17:44:12 +08:00
|
|
|
int fs_mode; /* fs mode: LFS or ADAPTIVE */
|
2020-02-14 17:44:13 +08:00
|
|
|
int bggc_mode; /* bggc mode: off, on or sync */
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
int discard_unit; /*
|
|
|
|
* discard command's offset/size should
|
|
|
|
* be aligned to this unit: block,
|
|
|
|
* segment or section
|
|
|
|
*/
|
fscrypt: handle test_dummy_encryption in more logical way
The behavior of the test_dummy_encryption mount option is that when a
new file (or directory or symlink) is created in an unencrypted
directory, it's automatically encrypted using a dummy encryption policy.
That's it; in particular, the encryption (or lack thereof) of existing
files (or directories or symlinks) doesn't change.
Unfortunately the implementation of test_dummy_encryption is a bit weird
and confusing. When test_dummy_encryption is enabled and a file is
being created in an unencrypted directory, we set up an encryption key
(->i_crypt_info) for the directory. This isn't actually used to do any
encryption, however, since the directory is still unencrypted! Instead,
->i_crypt_info is only used for inheriting the encryption policy.
One consequence of this is that the filesystem ends up providing a
"dummy context" (policy + nonce) instead of a "dummy policy". In
commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I
mistakenly thought this was required. However, actually the nonce only
ends up being used to derive a key that is never used.
Another consequence of this implementation is that it allows for
'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
case that can be forgotten about. For example, currently
FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
dummy encryption policy when the filesystem is mounted with
test_dummy_encryption. That seems like the wrong thing to do, since
again, the directory itself is not actually encrypted.
Therefore, switch to a more logical and maintainable implementation
where the dummy encryption policy inheritance is done without setting up
keys for unencrypted directories. This involves:
- Adding a function fscrypt_policy_to_inherit() which returns the
encryption policy to inherit from a directory. This can be a real
policy, a dummy policy, or no policy.
- Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.
- Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
of an inode.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-17 12:11:35 +08:00
|
|
|
struct fscrypt_dummy_policy dummy_enc_policy; /* test dummy encryption */
|
2020-05-16 08:20:50 +08:00
|
|
|
block_t unusable_cap_perc; /* percentage for cap */
|
2019-05-30 08:49:06 +08:00
|
|
|
block_t unusable_cap; /* Amount of space allowed to be
|
|
|
|
* unusable when disabling checkpoint
|
|
|
|
*/
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
|
|
|
|
/* For compression */
|
|
|
|
unsigned char compress_algorithm; /* algorithm type */
|
2020-11-26 18:32:09 +08:00
|
|
|
unsigned char compress_log_size; /* cluster log size */
|
2021-01-22 17:46:43 +08:00
|
|
|
unsigned char compress_level; /* compress level */
|
2020-11-26 18:32:09 +08:00
|
|
|
bool compress_chksum; /* compressed data chksum */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
unsigned char compress_ext_cnt; /* extension count */
|
2021-06-08 19:15:08 +08:00
|
|
|
unsigned char nocompress_ext_cnt; /* nocompress extension count */
|
2020-12-01 12:08:02 +08:00
|
|
|
int compress_mode; /* compression mode */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
unsigned char extensions[COMPRESS_EXT_NUM][F2FS_EXTENSION_LEN]; /* extensions */
|
2021-06-08 19:15:08 +08:00
|
|
|
unsigned char noextensions[COMPRESS_EXT_NUM][F2FS_EXTENSION_LEN]; /* extensions */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
};
|
|
|
|
|
2017-07-19 00:19:06 +08:00
|
|
|
#define F2FS_FEATURE_ENCRYPT 0x0001
|
|
|
|
#define F2FS_FEATURE_BLKZONED 0x0002
|
|
|
|
#define F2FS_FEATURE_ATOMIC_WRITE 0x0004
|
|
|
|
#define F2FS_FEATURE_EXTRA_ATTR 0x0008
|
2017-07-26 00:01:41 +08:00
|
|
|
#define F2FS_FEATURE_PRJQUOTA 0x0010
|
2017-07-31 20:19:09 +08:00
|
|
|
#define F2FS_FEATURE_INODE_CHKSUM 0x0020
|
f2fs: support flexible inline xattr size
Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.
In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.
So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.
To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.
Inode disk layout:
+----------------------+
| .i_mode |
| ... |
| .i_ext |
+----------------------+
| .i_extra_isize |
| .i_inline_xattr_size |-----------+
| ... | |
+----------------------+ |
| .i_addr | |
| - block address or | |
| - inline data | |
+----------------------+<---+ v
| inline xattr | +---inline xattr range
+----------------------+<---+
| .i_nid |
+----------------------+
| node_footer |
| (nid, ino, offset) |
+----------------------+
Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.
Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-06 21:59:50 +08:00
|
|
|
#define F2FS_FEATURE_FLEXIBLE_INLINE_XATTR 0x0040
|
2017-10-06 12:03:06 +08:00
|
|
|
#define F2FS_FEATURE_QUOTA_INO 0x0080
|
2018-01-25 14:54:42 +08:00
|
|
|
#define F2FS_FEATURE_INODE_CRTIME 0x0100
|
2018-03-15 18:51:41 +08:00
|
|
|
#define F2FS_FEATURE_LOST_FOUND 0x0200
|
f2fs: add fs-verity support
Add fs-verity support to f2fs. fs-verity is a filesystem feature that
enables transparent integrity protection and authentication of read-only
files. It uses a dm-verity like mechanism at the file level: a Merkle
tree is used to verify any block in the file in log(filesize) time. It
is implemented mainly by helper functions in fs/verity/. See
Documentation/filesystems/fsverity.rst for the full documentation.
The f2fs support for fs-verity consists of:
- Adding a filesystem feature flag and an inode flag for fs-verity.
- Implementing the fsverity_operations to support enabling verity on an
inode and reading/writing the verity metadata.
- Updating ->readpages() to verify data as it's read from verity files
and to support reading verity metadata pages.
- Updating ->write_begin(), ->write_end(), and ->writepages() to support
writing verity metadata pages.
- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
Like ext4, f2fs stores the verity metadata (Merkle tree and
fsverity_descriptor) past the end of the file, starting at the first 64K
boundary beyond i_size. This approach works because (a) verity files
are readonly, and (b) pages fully beyond i_size aren't visible to
userspace but can be read/written internally by f2fs with only some
relatively small changes to f2fs. Extended attributes cannot be used
because (a) f2fs limits the total size of an inode's xattr entries to
4096 bytes, which wouldn't be enough for even a single Merkle tree
block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity
metadata *must* be encrypted when the file is because it contains hashes
of the plaintext data.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-07-23 00:26:24 +08:00
|
|
|
#define F2FS_FEATURE_VERITY 0x0400
|
2018-09-28 20:25:56 +08:00
|
|
|
#define F2FS_FEATURE_SB_CHKSUM 0x0800
|
2019-07-24 07:05:28 +08:00
|
|
|
#define F2FS_FEATURE_CASEFOLD 0x1000
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
#define F2FS_FEATURE_COMPRESSION 0x2000
|
2021-05-21 16:32:53 +08:00
|
|
|
#define F2FS_FEATURE_RO 0x4000
|
2015-04-21 04:57:51 +08:00
|
|
|
|
2018-10-24 18:34:26 +08:00
|
|
|
#define __F2FS_HAS_FEATURE(raw_super, mask) \
|
|
|
|
((raw_super->feature & cpu_to_le32(mask)) != 0)
|
|
|
|
#define F2FS_HAS_FEATURE(sbi, mask) __F2FS_HAS_FEATURE(sbi->raw_super, mask)
|
|
|
|
#define F2FS_SET_FEATURE(sbi, mask) \
|
|
|
|
(sbi->raw_super->feature |= cpu_to_le32(mask))
|
|
|
|
#define F2FS_CLEAR_FEATURE(sbi, mask) \
|
|
|
|
(sbi->raw_super->feature &= ~cpu_to_le32(mask))
|
2015-04-14 06:10:36 +08:00
|
|
|
|
2018-01-05 13:36:09 +08:00
|
|
|
/*
|
|
|
|
* Default values for user and/or group using reserved blocks
|
|
|
|
*/
|
|
|
|
#define F2FS_DEF_RESUID 0
|
|
|
|
#define F2FS_DEF_RESGID 0
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
/*
|
|
|
|
* For checkpoint manager
|
|
|
|
*/
|
|
|
|
enum {
|
|
|
|
NAT_BITMAP,
|
|
|
|
SIT_BITMAP
|
|
|
|
};
|
|
|
|
|
2017-04-27 20:40:39 +08:00
|
|
|
#define CP_UMOUNT 0x00000001
|
|
|
|
#define CP_FASTBOOT 0x00000002
|
|
|
|
#define CP_SYNC 0x00000004
|
|
|
|
#define CP_RECOVERY 0x00000008
|
|
|
|
#define CP_DISCARD 0x00000010
|
2017-04-28 13:56:08 +08:00
|
|
|
#define CP_TRIMMED 0x00000020
|
2018-08-21 10:21:43 +08:00
|
|
|
#define CP_PAUSE 0x00000040
|
2020-04-01 02:43:07 +08:00
|
|
|
#define CP_RESIZE 0x00000080
|
2014-09-21 12:57:51 +08:00
|
|
|
|
2017-04-08 06:08:17 +08:00
|
|
|
#define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
|
2017-10-04 09:08:33 +08:00
|
|
|
#define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
#define DEF_MIN_DISCARD_ISSUE_TIME 50 /* 50 ms, if exists */
|
2018-04-08 15:11:11 +08:00
|
|
|
#define DEF_MID_DISCARD_ISSUE_TIME 500 /* 500 ms, if device busy */
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
#define DEF_MAX_DISCARD_ISSUE_TIME 60000 /* 60 s, if no candidates */
|
2018-05-30 00:58:42 +08:00
|
|
|
#define DEF_DISCARD_URGENT_UTIL 80 /* do more discard over 80% */
|
2015-10-06 05:49:57 +08:00
|
|
|
#define DEF_CP_INTERVAL 60 /* 60 secs */
|
2016-07-15 19:25:47 +08:00
|
|
|
#define DEF_IDLE_INTERVAL 5 /* 5 secs */
|
2018-08-21 10:21:43 +08:00
|
|
|
#define DEF_DISABLE_INTERVAL 5 /* 5 secs */
|
2019-01-25 09:48:38 +08:00
|
|
|
#define DEF_DISABLE_QUICK_INTERVAL 1 /* 1 secs */
|
2019-01-15 02:42:11 +08:00
|
|
|
#define DEF_UMOUNT_DISCARD_TIMEOUT 5 /* 5 secs */
|
2015-01-27 09:41:23 +08:00
|
|
|
|
2014-09-21 12:57:51 +08:00
|
|
|
struct cp_control {
|
|
|
|
int reason;
|
2014-09-21 13:06:39 +08:00
|
|
|
__u64 trim_start;
|
|
|
|
__u64 trim_end;
|
|
|
|
__u64 trim_minlen;
|
2014-09-21 12:57:51 +08:00
|
|
|
};
|
|
|
|
|
2014-02-07 16:11:53 +08:00
|
|
|
/*
|
2018-06-05 17:44:11 +08:00
|
|
|
* indicate meta/data type
|
2014-02-07 16:11:53 +08:00
|
|
|
*/
|
|
|
|
enum {
|
|
|
|
META_CP,
|
|
|
|
META_NAT,
|
2014-02-27 19:12:24 +08:00
|
|
|
META_SIT,
|
2014-09-12 04:49:55 +08:00
|
|
|
META_SSA,
|
2018-09-29 18:31:27 +08:00
|
|
|
META_MAX,
|
2014-09-12 04:49:55 +08:00
|
|
|
META_POR,
|
f2fs: introduce DATA_GENERIC_ENHANCE
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
whether @blkaddr locates in main area or not.
That check is weak, since the block address in range of main area can
point to the address which is not valid in segment info table, and we
can not detect such condition, we may suffer worse corruption as system
continues running.
So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
which trigger SIT bitmap check rather than only range check.
This patch did below changes as wel:
- set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
- get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
- introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
- spread blkaddr check in:
* f2fs_get_node_info()
* __read_out_blkaddrs()
* f2fs_submit_page_read()
* ra_data_block()
* do_recover_data()
This patch can fix bug reported from bugzilla below:
https://bugzilla.kernel.org/show_bug.cgi?id=203215
https://bugzilla.kernel.org/show_bug.cgi?id=203223
https://bugzilla.kernel.org/show_bug.cgi?id=203231
https://bugzilla.kernel.org/show_bug.cgi?id=203235
https://bugzilla.kernel.org/show_bug.cgi?id=203241
= Update by Jaegeuk Kim =
DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
But, xfstest/generic/446 compalins some generated kernel messages saying invalid
bitmap was detected when reading a block. The reaons is, when we get the
block addresses from extent_cache, there is no lock to synchronize it from
truncating the blocks in parallel.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-15 15:26:32 +08:00
|
|
|
DATA_GENERIC, /* check range only */
|
|
|
|
DATA_GENERIC_ENHANCE, /* strong check on range and segment bitmap */
|
|
|
|
DATA_GENERIC_ENHANCE_READ, /*
|
|
|
|
* strong check on range and segment
|
|
|
|
* bitmap but no warning due to race
|
|
|
|
* condition of read on truncated area
|
|
|
|
* by extent_cache
|
|
|
|
*/
|
2018-06-05 17:44:11 +08:00
|
|
|
META_GENERIC,
|
2014-02-07 16:11:53 +08:00
|
|
|
};
|
|
|
|
|
2014-07-26 06:47:17 +08:00
|
|
|
/* for the list of ino */
|
|
|
|
enum {
|
|
|
|
ORPHAN_INO, /* for orphan ino list */
|
2014-07-25 22:40:59 +08:00
|
|
|
APPEND_INO, /* for append ino list */
|
|
|
|
UPDATE_INO, /* for update ino list */
|
2017-12-29 00:09:44 +08:00
|
|
|
TRANS_DIR_INO, /* for trasactions dir ino list */
|
2017-09-29 13:59:38 +08:00
|
|
|
FLUSH_INO, /* for multiple device flushing */
|
2014-07-26 06:47:17 +08:00
|
|
|
MAX_INO_ENTRY, /* max. list */
|
|
|
|
};
|
|
|
|
|
|
|
|
struct ino_entry {
|
2017-09-29 13:59:38 +08:00
|
|
|
struct list_head list; /* list head */
|
|
|
|
nid_t ino; /* inode number */
|
|
|
|
unsigned int dirty_device; /* dirty device bitmap */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
};
|
|
|
|
|
2015-12-15 13:30:45 +08:00
|
|
|
/* for the list of inodes to be GCed */
|
2014-12-29 15:56:18 +08:00
|
|
|
struct inode_entry {
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct list_head list; /* list head */
|
|
|
|
struct inode *inode; /* vfs inode pointer */
|
|
|
|
};
|
|
|
|
|
f2fs: fix to avoid broken of dnode block list
f2fs recovery flow is relying on dnode block link list, it means fsynced
file recovery depends on previous dnode's persistence in the list, so
during fsync() we should wait on all regular inode's dnode writebacked
before issuing flush.
By this way, we can avoid dnode block list being broken by out-of-order
IO submission due to IO scheduler or driver.
Sheng Yong helps to do the test with this patch:
Target:/data (f2fs, -)
64MB / 32768KB / 4KB / 8
1 / PERSIST / Index
Base:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 867.82 204.15 41440.03 41370.54 680.8 1025.94 1031.08
2 871.87 205.87 41370.3 40275.2 791.14 1065.84 1101.7
3 866.52 205.69 41795.67 40596.16 694.69 1037.16 1031.48
Avg 868.7366667 205.2366667 41535.33333 40747.3 722.21 1042.98 1054.753333
After:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 798.81 202.5 41143 40613.87 602.71 838.08 913.83
2 805.79 206.47 40297.2 41291.46 604.44 840.75 924.27
3 814.83 206.17 41209.57 40453.62 602.85 834.66 927.91
Avg 806.4766667 205.0466667 40883.25667 40786.31667 603.3333333 837.83 922.0033333
Patched/Original:
0.928332713 0.999074239 0.984300676 1.000957528 0.835398753 0.803303994 0.874141189
It looks like atomic write will suffer performance regression.
I suspect that the criminal is that we forcing to wait all dnode being in
storage cache before we issue PREFLUSH+FUA.
BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
cause the problem: we will lose data of last transaction after SPO, even if
atomic write return no error:
- atomic_open();
- write() P1, P2, P3;
- atomic_commit();
- writeback data: P1, P2, P3;
- writeback node: N1, N2, N3; <--- If N1, N2 is not writebacked, N3 with fsync_mark is
writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
last transaction.
- preflush + fua;
- power-cut
If we don't wait dnode writeback for atomic_write:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 779.91 206.03 41621.5 40333.16 716.9 1038.21 1034.85
2 848.51 204.35 40082.44 39486.17 791.83 1119.96 1083.77
3 772.12 206.27 41335.25 41599.65 723.29 1055.07 971.92
Avg 800.18 205.55 41013.06333 40472.99333 744.0066667 1071.08 1030.18
Patched/Original:
0.92108464 1.001526693 0.987425886 0.993268102 1.030180511 1.026942031 0.976702294
SQLite's performance recovers.
Jaegeuk:
"Practically, I don't see db corruption becase of this. We can excuse to lose
the last transaction."
Finally, we decide to keep original implementation of atomic write interface
sematics that we don't wait all dnode writeback before preflush+fua submission.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-02 23:03:19 +08:00
|
|
|
struct fsync_node_entry {
|
|
|
|
struct list_head list; /* list head */
|
|
|
|
struct page *page; /* warm node page pointer */
|
|
|
|
unsigned int seq_id; /* sequence id */
|
|
|
|
};
|
|
|
|
|
f2fs: introduce checkpoint_merge mount option
We've added a new mount options, "checkpoint_merge" and "nocheckpoint_merge",
which creates a kernel daemon and makes it to merge concurrent checkpoint
requests as much as possible to eliminate redundant checkpoint issues. Plus,
we can eliminate the sluggish issue caused by slow checkpoint operation
when the checkpoint is done in a process context in a cgroup having
low i/o budget and cpu shares. To make this do better, we set the
default i/o priority of the kernel daemon to "3", to give one higher
priority than other kernel threads. The below verification result
explains this.
The basic idea has come from https://opensource.samsung.com.
[Verification]
Android Pixel Device(ARM64, 7GB RAM, 256GB UFS)
Create two I/O cgroups (fg w/ weight 100, bg w/ wight 20)
Set "strict_guarantees" to "1" in BFQ tunables
In "fg" cgroup,
- thread A => trigger 1000 checkpoint operations
"for i in `seq 1 1000`; do touch test_dir1/file; fsync test_dir1;
done"
- thread B => gererating async. I/O
"fio --rw=write --numjobs=1 --bs=128k --runtime=3600 --time_based=1
--filename=test_img --name=test"
In "bg" cgroup,
- thread C => trigger repeated checkpoint operations
"echo $$ > /dev/blkio/bg/tasks; while true; do touch test_dir2/file;
fsync test_dir2; done"
We've measured thread A's execution time.
[ w/o patch ]
Elapsed Time: Avg. 68 seconds
[ w/ patch ]
Elapsed Time: Avg. 48 seconds
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
[Jaegeuk Kim: fix the return value in f2fs_start_ckpt_thread, reported by Dan]
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-19 08:00:42 +08:00
|
|
|
struct ckpt_req {
|
|
|
|
struct completion wait; /* completion for checkpoint done */
|
|
|
|
struct llist_node llnode; /* llist_node to be linked in wait queue */
|
|
|
|
int ret; /* return code of checkpoint */
|
|
|
|
ktime_t queue_time; /* request queued time */
|
|
|
|
};
|
|
|
|
|
|
|
|
struct ckpt_req_control {
|
|
|
|
struct task_struct *f2fs_issue_ckpt; /* checkpoint task */
|
2021-01-21 21:45:29 +08:00
|
|
|
int ckpt_thread_ioprio; /* checkpoint merge thread ioprio */
|
f2fs: introduce checkpoint_merge mount option
We've added a new mount options, "checkpoint_merge" and "nocheckpoint_merge",
which creates a kernel daemon and makes it to merge concurrent checkpoint
requests as much as possible to eliminate redundant checkpoint issues. Plus,
we can eliminate the sluggish issue caused by slow checkpoint operation
when the checkpoint is done in a process context in a cgroup having
low i/o budget and cpu shares. To make this do better, we set the
default i/o priority of the kernel daemon to "3", to give one higher
priority than other kernel threads. The below verification result
explains this.
The basic idea has come from https://opensource.samsung.com.
[Verification]
Android Pixel Device(ARM64, 7GB RAM, 256GB UFS)
Create two I/O cgroups (fg w/ weight 100, bg w/ wight 20)
Set "strict_guarantees" to "1" in BFQ tunables
In "fg" cgroup,
- thread A => trigger 1000 checkpoint operations
"for i in `seq 1 1000`; do touch test_dir1/file; fsync test_dir1;
done"
- thread B => gererating async. I/O
"fio --rw=write --numjobs=1 --bs=128k --runtime=3600 --time_based=1
--filename=test_img --name=test"
In "bg" cgroup,
- thread C => trigger repeated checkpoint operations
"echo $$ > /dev/blkio/bg/tasks; while true; do touch test_dir2/file;
fsync test_dir2; done"
We've measured thread A's execution time.
[ w/o patch ]
Elapsed Time: Avg. 68 seconds
[ w/ patch ]
Elapsed Time: Avg. 48 seconds
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
[Jaegeuk Kim: fix the return value in f2fs_start_ckpt_thread, reported by Dan]
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-19 08:00:42 +08:00
|
|
|
wait_queue_head_t ckpt_wait_queue; /* waiting queue for wake-up */
|
|
|
|
atomic_t issued_ckpt; /* # of actually issued ckpts */
|
|
|
|
atomic_t total_ckpt; /* # of total ckpts */
|
|
|
|
atomic_t queued_ckpt; /* # of queued ckpts */
|
|
|
|
struct llist_head issue_list; /* list for command issue */
|
|
|
|
spinlock_t stat_lock; /* lock for below checkpoint time stats */
|
|
|
|
unsigned int cur_time; /* cur wait time in msec for currently issued checkpoint */
|
|
|
|
unsigned int peak_time; /* peak wait time in msec until now */
|
|
|
|
};
|
|
|
|
|
2017-03-28 18:18:50 +08:00
|
|
|
/* for the bitmap indicate blocks to be discarded */
|
2013-11-15 12:55:58 +08:00
|
|
|
struct discard_entry {
|
|
|
|
struct list_head list; /* list head */
|
2017-03-28 18:18:50 +08:00
|
|
|
block_t start_blkaddr; /* start blockaddr of current segment */
|
|
|
|
unsigned char discard_map[SIT_VBLOCK_MAP_SIZE]; /* segment discard bitmap */
|
2013-11-15 12:55:58 +08:00
|
|
|
};
|
|
|
|
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
/* default discard granularity of inner discard thread, unit: block count */
|
|
|
|
#define DEFAULT_DISCARD_GRANULARITY 16
|
|
|
|
|
2017-04-15 14:09:37 +08:00
|
|
|
/* max discard pend list number */
|
|
|
|
#define MAX_PLIST_NUM 512
|
|
|
|
#define plist_idx(blk_num) ((blk_num) >= MAX_PLIST_NUM ? \
|
2019-01-14 22:05:14 +08:00
|
|
|
(MAX_PLIST_NUM - 1) : ((blk_num) - 1))
|
2017-04-15 14:09:37 +08:00
|
|
|
|
2017-01-10 12:32:07 +08:00
|
|
|
enum {
|
2018-08-06 22:43:50 +08:00
|
|
|
D_PREP, /* initial */
|
|
|
|
D_PARTIAL, /* partially submitted */
|
|
|
|
D_SUBMIT, /* all submitted */
|
|
|
|
D_DONE, /* finished */
|
2017-01-10 12:32:07 +08:00
|
|
|
};
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
struct discard_info {
|
|
|
|
block_t lstart; /* logical start address */
|
|
|
|
block_t len; /* length */
|
|
|
|
block_t start; /* actual start address in dev */
|
|
|
|
};
|
|
|
|
|
2017-01-10 06:13:03 +08:00
|
|
|
struct discard_cmd {
|
2017-04-14 23:24:55 +08:00
|
|
|
struct rb_node rb_node; /* rb node located in rb-tree */
|
|
|
|
union {
|
|
|
|
struct {
|
|
|
|
block_t lstart; /* logical start address */
|
|
|
|
block_t len; /* length */
|
|
|
|
block_t start; /* actual start address in dev */
|
|
|
|
};
|
|
|
|
struct discard_info di; /* discard info */
|
|
|
|
|
|
|
|
};
|
2017-01-10 06:13:03 +08:00
|
|
|
struct list_head list; /* command list */
|
|
|
|
struct completion wait; /* compleation */
|
2017-03-08 10:02:02 +08:00
|
|
|
struct block_device *bdev; /* bdev */
|
2017-04-26 17:39:54 +08:00
|
|
|
unsigned short ref; /* reference count */
|
2017-04-26 17:39:55 +08:00
|
|
|
unsigned char state; /* state */
|
2018-12-14 08:53:57 +08:00
|
|
|
unsigned char queued; /* queued discard */
|
2017-03-08 10:02:02 +08:00
|
|
|
int error; /* bio error */
|
2018-08-06 22:43:50 +08:00
|
|
|
spinlock_t lock; /* for state/bio_ref updating */
|
|
|
|
unsigned short bio_ref; /* bio reference count */
|
2016-08-29 23:58:34 +08:00
|
|
|
};
|
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
enum {
|
|
|
|
DPOLICY_BG,
|
|
|
|
DPOLICY_FORCE,
|
|
|
|
DPOLICY_FSTRIM,
|
|
|
|
DPOLICY_UMOUNT,
|
|
|
|
MAX_DPOLICY,
|
|
|
|
};
|
|
|
|
|
2017-10-04 09:08:33 +08:00
|
|
|
struct discard_policy {
|
2017-10-04 09:08:34 +08:00
|
|
|
int type; /* type of discard */
|
2017-10-04 09:08:33 +08:00
|
|
|
unsigned int min_interval; /* used for candidates exist */
|
2018-04-08 15:11:11 +08:00
|
|
|
unsigned int mid_interval; /* used for device busy */
|
2017-10-04 09:08:33 +08:00
|
|
|
unsigned int max_interval; /* used for candidates not exist */
|
|
|
|
unsigned int max_requests; /* # of discards issued per round */
|
|
|
|
unsigned int io_aware_gran; /* minimum granularity discard not be aware of I/O */
|
|
|
|
bool io_aware; /* issue discard in idle time */
|
|
|
|
bool sync; /* submit discard with REQ_SYNC flag */
|
2018-07-08 22:11:01 +08:00
|
|
|
bool ordered; /* issue discard by lba order */
|
2020-03-26 17:43:56 +08:00
|
|
|
bool timeout; /* discard timeout for put_super */
|
2017-10-04 09:08:34 +08:00
|
|
|
unsigned int granularity; /* discard granularity */
|
2017-10-04 09:08:33 +08:00
|
|
|
};
|
|
|
|
|
2017-01-12 06:40:24 +08:00
|
|
|
struct discard_cmd_control {
|
2017-01-10 12:32:07 +08:00
|
|
|
struct task_struct *f2fs_issue_discard; /* discard thread */
|
2017-04-15 14:09:36 +08:00
|
|
|
struct list_head entry_list; /* 4KB discard entry list */
|
2017-04-15 14:09:37 +08:00
|
|
|
struct list_head pend_list[MAX_PLIST_NUM];/* store pending entries */
|
2017-04-15 14:09:36 +08:00
|
|
|
struct list_head wait_list; /* store on-flushing entries */
|
2017-10-04 09:08:32 +08:00
|
|
|
struct list_head fstrim_list; /* in-flight discard from fstrim */
|
2017-01-10 12:32:07 +08:00
|
|
|
wait_queue_head_t discard_wait_queue; /* waiting queue for wake-up */
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
unsigned int discard_wake; /* to wake up discard thread */
|
2017-01-10 12:32:07 +08:00
|
|
|
struct mutex cmd_lock;
|
2017-04-25 00:21:35 +08:00
|
|
|
unsigned int nr_discards; /* # of discards in the list */
|
|
|
|
unsigned int max_discards; /* max. discards to be issued */
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
unsigned int discard_granularity; /* discard granularity */
|
2017-04-18 19:27:39 +08:00
|
|
|
unsigned int undiscard_blks; /* # of undiscard blocks */
|
2018-07-08 22:11:01 +08:00
|
|
|
unsigned int next_pos; /* next discard position */
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_t issued_discard; /* # of issued discard */
|
2018-12-14 08:53:57 +08:00
|
|
|
atomic_t queued_discard; /* # of queued discard */
|
2017-03-25 17:19:59 +08:00
|
|
|
atomic_t discard_cmd_cnt; /* # of cached cmd count */
|
2018-10-04 11:18:30 +08:00
|
|
|
struct rb_root_cached root; /* root of discard rb-tree */
|
2018-06-22 16:06:59 +08:00
|
|
|
bool rbtree_check; /* config for consistence check */
|
2016-08-29 23:58:34 +08:00
|
|
|
};
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
/* for the list of fsync inodes, used only during recovery */
|
|
|
|
struct fsync_inode_entry {
|
|
|
|
struct list_head list; /* list head */
|
|
|
|
struct inode *inode; /* vfs inode pointer */
|
2014-09-12 05:29:06 +08:00
|
|
|
block_t blkaddr; /* block address locating the last fsync */
|
|
|
|
block_t last_dentry; /* block address locating the last dentry */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
};
|
|
|
|
|
2017-04-09 07:11:36 +08:00
|
|
|
#define nats_in_cursum(jnl) (le16_to_cpu((jnl)->n_nats))
|
|
|
|
#define sits_in_cursum(jnl) (le16_to_cpu((jnl)->n_sits))
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2017-04-09 07:11:36 +08:00
|
|
|
#define nat_in_journal(jnl, i) ((jnl)->nat_j.entries[i].ne)
|
|
|
|
#define nid_in_journal(jnl, i) ((jnl)->nat_j.entries[i].nid)
|
|
|
|
#define sit_in_journal(jnl, i) ((jnl)->sit_j.entries[i].se)
|
|
|
|
#define segno_in_journal(jnl, i) ((jnl)->sit_j.entries[i].segno)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2016-02-14 18:50:40 +08:00
|
|
|
#define MAX_NAT_JENTRIES(jnl) (NAT_JOURNAL_ENTRIES - nats_in_cursum(jnl))
|
|
|
|
#define MAX_SIT_JENTRIES(jnl) (SIT_JOURNAL_ENTRIES - sits_in_cursum(jnl))
|
2014-09-23 02:40:48 +08:00
|
|
|
|
2016-02-14 18:50:40 +08:00
|
|
|
static inline int update_nats_in_cursum(struct f2fs_journal *journal, int i)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2016-02-14 18:50:40 +08:00
|
|
|
int before = nats_in_cursum(journal);
|
2017-01-31 02:55:18 +08:00
|
|
|
|
2016-02-14 18:50:40 +08:00
|
|
|
journal->n_nats = cpu_to_le16(before + i);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
return before;
|
|
|
|
}
|
|
|
|
|
2016-02-14 18:50:40 +08:00
|
|
|
static inline int update_sits_in_cursum(struct f2fs_journal *journal, int i)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2016-02-14 18:50:40 +08:00
|
|
|
int before = sits_in_cursum(journal);
|
2017-01-31 02:55:18 +08:00
|
|
|
|
2016-02-14 18:50:40 +08:00
|
|
|
journal->n_sits = cpu_to_le16(before + i);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
return before;
|
|
|
|
}
|
|
|
|
|
2016-02-14 18:50:40 +08:00
|
|
|
static inline bool __has_cursum_space(struct f2fs_journal *journal,
|
|
|
|
int size, int type)
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
{
|
|
|
|
if (type == NAT_JOURNAL)
|
2016-02-14 18:50:40 +08:00
|
|
|
return size <= MAX_NAT_JENTRIES(journal);
|
|
|
|
return size <= MAX_SIT_JENTRIES(journal);
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
}
|
|
|
|
|
2017-07-19 00:19:05 +08:00
|
|
|
/* for inline stuff */
|
|
|
|
#define DEF_INLINE_RESERVED_SIZE 1
|
2017-07-19 00:19:06 +08:00
|
|
|
static inline int get_extra_isize(struct inode *inode);
|
f2fs: support flexible inline xattr size
Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.
In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.
So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.
To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.
Inode disk layout:
+----------------------+
| .i_mode |
| ... |
| .i_ext |
+----------------------+
| .i_extra_isize |
| .i_inline_xattr_size |-----------+
| ... | |
+----------------------+ |
| .i_addr | |
| - block address or | |
| - inline data | |
+----------------------+<---+ v
| inline xattr | +---inline xattr range
+----------------------+<---+
| .i_nid |
+----------------------+
| node_footer |
| (nid, ino, offset) |
+----------------------+
Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.
Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-06 21:59:50 +08:00
|
|
|
static inline int get_inline_xattr_addrs(struct inode *inode);
|
|
|
|
#define MAX_INLINE_DATA(inode) (sizeof(__le32) * \
|
|
|
|
(CUR_ADDRS_PER_INODE(inode) - \
|
2018-01-17 16:31:36 +08:00
|
|
|
get_inline_xattr_addrs(inode) - \
|
f2fs: support flexible inline xattr size
Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.
In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.
So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.
To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.
Inode disk layout:
+----------------------+
| .i_mode |
| ... |
| .i_ext |
+----------------------+
| .i_extra_isize |
| .i_inline_xattr_size |-----------+
| ... | |
+----------------------+ |
| .i_addr | |
| - block address or | |
| - inline data | |
+----------------------+<---+ v
| inline xattr | +---inline xattr range
+----------------------+<---+
| .i_nid |
+----------------------+
| node_footer |
| (nid, ino, offset) |
+----------------------+
Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.
Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-06 21:59:50 +08:00
|
|
|
DEF_INLINE_RESERVED_SIZE))
|
2017-07-19 00:19:05 +08:00
|
|
|
|
|
|
|
/* for inline dir */
|
|
|
|
#define NR_INLINE_DENTRY(inode) (MAX_INLINE_DATA(inode) * BITS_PER_BYTE / \
|
|
|
|
((SIZE_OF_DIR_ENTRY + F2FS_SLOT_LEN) * \
|
|
|
|
BITS_PER_BYTE + 1))
|
2019-06-20 22:42:08 +08:00
|
|
|
#define INLINE_DENTRY_BITMAP_SIZE(inode) \
|
|
|
|
DIV_ROUND_UP(NR_INLINE_DENTRY(inode), BITS_PER_BYTE)
|
2017-07-19 00:19:05 +08:00
|
|
|
#define INLINE_RESERVED_SIZE(inode) (MAX_INLINE_DATA(inode) - \
|
|
|
|
((SIZE_OF_DIR_ENTRY + F2FS_SLOT_LEN) * \
|
|
|
|
NR_INLINE_DENTRY(inode) + \
|
|
|
|
INLINE_DENTRY_BITMAP_SIZE(inode)))
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
/*
|
|
|
|
* For INODE and NODE manager
|
|
|
|
*/
|
2014-10-19 13:52:52 +08:00
|
|
|
/* for directory operations */
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 15:59:04 +08:00
|
|
|
|
|
|
|
struct f2fs_filename {
|
|
|
|
/*
|
|
|
|
* The filename the user specified. This is NULL for some
|
|
|
|
* filesystem-internal operations, e.g. converting an inline directory
|
|
|
|
* to a non-inline one, or roll-forward recovering an encrypted dentry.
|
|
|
|
*/
|
|
|
|
const struct qstr *usr_fname;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The on-disk filename. For encrypted directories, this is encrypted.
|
|
|
|
* This may be NULL for lookups in an encrypted dir without the key.
|
|
|
|
*/
|
|
|
|
struct fscrypt_str disk_name;
|
|
|
|
|
|
|
|
/* The dirhash of this filename */
|
|
|
|
f2fs_hash_t hash;
|
|
|
|
|
|
|
|
#ifdef CONFIG_FS_ENCRYPTION
|
|
|
|
/*
|
|
|
|
* For lookups in encrypted directories: either the buffer backing
|
|
|
|
* disk_name, or a buffer that holds the decoded no-key name.
|
|
|
|
*/
|
|
|
|
struct fscrypt_str crypto_buf;
|
|
|
|
#endif
|
|
|
|
#ifdef CONFIG_UNICODE
|
|
|
|
/*
|
|
|
|
* For casefolded directories: the casefolded name, but it's left NULL
|
2020-11-19 14:09:04 +08:00
|
|
|
* if the original name is not valid Unicode, if the directory is both
|
|
|
|
* casefolded and encrypted and its encryption key is unavailable, or if
|
|
|
|
* the filesystem is doing an internal operation where usr_fname is also
|
|
|
|
* NULL. In all these cases we fall back to treating the name as an
|
|
|
|
* opaque byte sequence.
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 15:59:04 +08:00
|
|
|
*/
|
|
|
|
struct fscrypt_str cf_name;
|
|
|
|
#endif
|
|
|
|
};
|
|
|
|
|
2014-10-19 13:52:52 +08:00
|
|
|
struct f2fs_dentry_ptr {
|
2015-04-28 07:26:24 +08:00
|
|
|
struct inode *inode;
|
2017-07-16 15:08:54 +08:00
|
|
|
void *bitmap;
|
2014-10-19 13:52:52 +08:00
|
|
|
struct f2fs_dir_entry *dentry;
|
|
|
|
__u8 (*filename)[F2FS_SLOT_LEN];
|
|
|
|
int max;
|
2017-07-16 15:08:54 +08:00
|
|
|
int nr_bitmap;
|
2014-10-19 13:52:52 +08:00
|
|
|
};
|
|
|
|
|
2017-04-04 18:01:22 +08:00
|
|
|
static inline void make_dentry_ptr_block(struct inode *inode,
|
|
|
|
struct f2fs_dentry_ptr *d, struct f2fs_dentry_block *t)
|
2014-10-19 13:52:52 +08:00
|
|
|
{
|
2015-04-28 07:26:24 +08:00
|
|
|
d->inode = inode;
|
2017-04-04 18:01:22 +08:00
|
|
|
d->max = NR_DENTRY_IN_BLOCK;
|
2017-07-16 15:08:54 +08:00
|
|
|
d->nr_bitmap = SIZE_OF_DENTRY_BITMAP;
|
2018-04-02 20:22:20 +08:00
|
|
|
d->bitmap = t->dentry_bitmap;
|
2017-04-04 18:01:22 +08:00
|
|
|
d->dentry = t->dentry;
|
|
|
|
d->filename = t->filename;
|
|
|
|
}
|
2015-04-28 07:26:24 +08:00
|
|
|
|
2017-04-04 18:01:22 +08:00
|
|
|
static inline void make_dentry_ptr_inline(struct inode *inode,
|
2017-07-19 00:19:05 +08:00
|
|
|
struct f2fs_dentry_ptr *d, void *t)
|
2017-04-04 18:01:22 +08:00
|
|
|
{
|
2017-07-19 00:19:05 +08:00
|
|
|
int entry_cnt = NR_INLINE_DENTRY(inode);
|
|
|
|
int bitmap_size = INLINE_DENTRY_BITMAP_SIZE(inode);
|
|
|
|
int reserved_size = INLINE_RESERVED_SIZE(inode);
|
|
|
|
|
2017-04-04 18:01:22 +08:00
|
|
|
d->inode = inode;
|
2017-07-19 00:19:05 +08:00
|
|
|
d->max = entry_cnt;
|
|
|
|
d->nr_bitmap = bitmap_size;
|
|
|
|
d->bitmap = t;
|
|
|
|
d->dentry = t + bitmap_size + reserved_size;
|
|
|
|
d->filename = t + bitmap_size + reserved_size +
|
|
|
|
SIZE_OF_DIR_ENTRY * entry_cnt;
|
2014-10-19 13:52:52 +08:00
|
|
|
}
|
|
|
|
|
2013-08-09 07:14:06 +08:00
|
|
|
/*
|
|
|
|
* XATTR_NODE_OFFSET stores xattrs to one node block per file keeping -1
|
|
|
|
* as its node offset to distinguish from index node blocks.
|
|
|
|
* But some bits are used to mark the node block.
|
|
|
|
*/
|
|
|
|
#define XATTR_NODE_OFFSET ((((unsigned int)-1) << OFFSET_BIT_SHIFT) \
|
|
|
|
>> OFFSET_BIT_SHIFT)
|
2013-02-26 12:10:46 +08:00
|
|
|
enum {
|
|
|
|
ALLOC_NODE, /* allocate a new node page if needed */
|
|
|
|
LOOKUP_NODE, /* look up a node without readahead */
|
|
|
|
LOOKUP_NODE_RA, /*
|
|
|
|
* look up a node with readahead called
|
2013-12-21 18:02:14 +08:00
|
|
|
* by get_data_block.
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
*/
|
2013-02-26 12:10:46 +08:00
|
|
|
};
|
|
|
|
|
2021-08-04 08:38:38 +08:00
|
|
|
#define DEFAULT_RETRY_IO_COUNT 8 /* maximum retry read IO or flush count */
|
2018-07-17 00:02:17 +08:00
|
|
|
|
2020-02-17 17:45:44 +08:00
|
|
|
/* congestion wait timeout value, default: 20ms */
|
|
|
|
#define DEFAULT_IO_TIMEOUT (msecs_to_jiffies(20))
|
|
|
|
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 20:05:00 +08:00
|
|
|
/* maximum retry quota flush count */
|
|
|
|
#define DEFAULT_RETRY_QUOTA_FLUSH_COUNT 8
|
|
|
|
|
2015-08-11 06:01:12 +08:00
|
|
|
#define F2FS_LINK_MAX 0xffffffff /* maximum link count per file */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2014-04-28 17:59:43 +08:00
|
|
|
#define MAX_DIR_RA_PAGES 4 /* maximum ra pages of dir */
|
|
|
|
|
2021-09-16 17:09:03 +08:00
|
|
|
/* dirty segments threshold for triggering CP */
|
|
|
|
#define DEFAULT_DIRTY_THRESHOLD 4
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
/* for in-memory extent cache entry */
|
2015-02-05 17:52:58 +08:00
|
|
|
#define F2FS_MIN_EXTENT_LEN 64 /* minimum extent length */
|
|
|
|
|
|
|
|
/* number of extent info in extent cache we try to shrink */
|
|
|
|
#define EXTENT_CACHE_SHRINK_NUMBER 128
|
2013-11-19 09:41:54 +08:00
|
|
|
|
2017-04-11 09:25:22 +08:00
|
|
|
struct rb_entry {
|
|
|
|
struct rb_node rb_node; /* rb node located in rb-tree */
|
2020-08-04 21:14:48 +08:00
|
|
|
union {
|
|
|
|
struct {
|
|
|
|
unsigned int ofs; /* start offset of the entry */
|
|
|
|
unsigned int len; /* length of the entry */
|
|
|
|
};
|
|
|
|
unsigned long long key; /* 64-bits key */
|
2020-10-08 02:14:35 +08:00
|
|
|
} __packed;
|
2017-04-11 09:25:22 +08:00
|
|
|
};
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct extent_info {
|
2015-02-05 17:52:58 +08:00
|
|
|
unsigned int fofs; /* start offset in a file */
|
|
|
|
unsigned int len; /* length of the extent */
|
2017-04-11 09:25:22 +08:00
|
|
|
u32 blk; /* start block address of the extent */
|
2021-08-04 10:23:48 +08:00
|
|
|
#ifdef CONFIG_F2FS_FS_COMPRESSION
|
|
|
|
unsigned int c_len; /* physical extent length of compressed blocks */
|
|
|
|
#endif
|
2015-02-05 17:52:58 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
struct extent_node {
|
2018-12-18 19:20:16 +08:00
|
|
|
struct rb_node rb_node; /* rb node located in rb-tree */
|
|
|
|
struct extent_info ei; /* extent info */
|
2015-02-05 17:52:58 +08:00
|
|
|
struct list_head list; /* node in global extent list of sbi */
|
2016-01-26 20:56:26 +08:00
|
|
|
struct extent_tree *et; /* extent tree pointer */
|
2015-02-05 17:52:58 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
struct extent_tree {
|
|
|
|
nid_t ino; /* inode number */
|
2018-10-04 11:18:30 +08:00
|
|
|
struct rb_root_cached root; /* root of extent info rb-tree */
|
2015-02-05 18:01:39 +08:00
|
|
|
struct extent_node *cached_en; /* recently accessed extent node */
|
2015-06-20 08:53:26 +08:00
|
|
|
struct extent_info largest; /* largested extent info */
|
2016-01-01 07:02:16 +08:00
|
|
|
struct list_head list; /* to be used by sbi->zombie_list */
|
2015-02-05 17:52:58 +08:00
|
|
|
rwlock_t lock; /* protect extent info rb-tree */
|
2016-01-08 20:22:52 +08:00
|
|
|
atomic_t node_cnt; /* # of extent node in rb-tree*/
|
2018-09-10 16:18:25 +08:00
|
|
|
bool largest_updated; /* largest extent updated */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
};
|
|
|
|
|
2015-04-07 10:55:34 +08:00
|
|
|
/*
|
|
|
|
* This structure is taken from ext4_map_blocks.
|
|
|
|
*
|
|
|
|
* Note that, however, f2fs uses NEW and MAPPED flags for f2fs_map_blocks().
|
|
|
|
*/
|
|
|
|
#define F2FS_MAP_NEW (1 << BH_New)
|
|
|
|
#define F2FS_MAP_MAPPED (1 << BH_Mapped)
|
2015-05-09 10:30:32 +08:00
|
|
|
#define F2FS_MAP_UNWRITTEN (1 << BH_Unwritten)
|
|
|
|
#define F2FS_MAP_FLAGS (F2FS_MAP_NEW | F2FS_MAP_MAPPED |\
|
|
|
|
F2FS_MAP_UNWRITTEN)
|
2015-04-07 10:55:34 +08:00
|
|
|
|
|
|
|
struct f2fs_map_blocks {
|
2021-09-01 14:39:20 +08:00
|
|
|
struct block_device *m_bdev; /* for multi-device dio */
|
2015-04-07 10:55:34 +08:00
|
|
|
block_t m_pblk;
|
|
|
|
block_t m_lblk;
|
|
|
|
unsigned int m_len;
|
|
|
|
unsigned int m_flags;
|
2016-01-26 15:42:58 +08:00
|
|
|
pgoff_t *m_next_pgofs; /* point next possible non-hole pgofs */
|
2018-01-11 14:42:30 +08:00
|
|
|
pgoff_t *m_next_extent; /* point to next possible extent */
|
2017-11-28 08:23:00 +08:00
|
|
|
int m_seg_type;
|
2018-11-13 14:33:45 +08:00
|
|
|
bool m_may_create; /* indicate it is from write path */
|
2021-09-01 14:39:20 +08:00
|
|
|
bool m_multidev_dio; /* indicate it allows multi-device dio */
|
2015-04-07 10:55:34 +08:00
|
|
|
};
|
|
|
|
|
2015-08-19 19:11:19 +08:00
|
|
|
/* for flag in get_data_block */
|
2017-08-09 17:27:30 +08:00
|
|
|
enum {
|
|
|
|
F2FS_GET_BLOCK_DEFAULT,
|
|
|
|
F2FS_GET_BLOCK_FIEMAP,
|
|
|
|
F2FS_GET_BLOCK_BMAP,
|
2018-09-20 06:28:40 +08:00
|
|
|
F2FS_GET_BLOCK_DIO,
|
2017-08-09 17:27:30 +08:00
|
|
|
F2FS_GET_BLOCK_PRE_DIO,
|
|
|
|
F2FS_GET_BLOCK_PRE_AIO,
|
2018-01-11 14:42:30 +08:00
|
|
|
F2FS_GET_BLOCK_PRECACHE,
|
2017-08-09 17:27:30 +08:00
|
|
|
};
|
2015-08-19 19:11:19 +08:00
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
/*
|
|
|
|
* i_advise uses FADVISE_XXX_BIT. We can add additional hints later.
|
|
|
|
*/
|
|
|
|
#define FADVISE_COLD_BIT 0x01
|
2013-06-14 07:52:35 +08:00
|
|
|
#define FADVISE_LOST_PINO_BIT 0x02
|
2015-04-21 04:57:51 +08:00
|
|
|
#define FADVISE_ENCRYPT_BIT 0x04
|
2015-04-30 08:02:18 +08:00
|
|
|
#define FADVISE_ENC_NAME_BIT 0x08
|
2016-11-29 07:33:38 +08:00
|
|
|
#define FADVISE_KEEP_SIZE_BIT 0x10
|
2018-02-28 17:07:27 +08:00
|
|
|
#define FADVISE_HOT_BIT 0x20
|
f2fs: add fs-verity support
Add fs-verity support to f2fs. fs-verity is a filesystem feature that
enables transparent integrity protection and authentication of read-only
files. It uses a dm-verity like mechanism at the file level: a Merkle
tree is used to verify any block in the file in log(filesize) time. It
is implemented mainly by helper functions in fs/verity/. See
Documentation/filesystems/fsverity.rst for the full documentation.
The f2fs support for fs-verity consists of:
- Adding a filesystem feature flag and an inode flag for fs-verity.
- Implementing the fsverity_operations to support enabling verity on an
inode and reading/writing the verity metadata.
- Updating ->readpages() to verify data as it's read from verity files
and to support reading verity metadata pages.
- Updating ->write_begin(), ->write_end(), and ->writepages() to support
writing verity metadata pages.
- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
Like ext4, f2fs stores the verity metadata (Merkle tree and
fsverity_descriptor) past the end of the file, starting at the first 64K
boundary beyond i_size. This approach works because (a) verity files
are readonly, and (b) pages fully beyond i_size aren't visible to
userspace but can be read/written internally by f2fs with only some
relatively small changes to f2fs. Extended attributes cannot be used
because (a) f2fs limits the total size of an inode's xattr entries to
4096 bytes, which wouldn't be enough for even a single Merkle tree
block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity
metadata *must* be encrypted when the file is because it contains hashes
of the plaintext data.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-07-23 00:26:24 +08:00
|
|
|
#define FADVISE_VERITY_BIT 0x40
|
2021-11-13 06:31:16 +08:00
|
|
|
#define FADVISE_TRUNC_BIT 0x80
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2018-07-19 23:57:54 +08:00
|
|
|
#define FADVISE_MODIFIABLE_BITS (FADVISE_COLD_BIT | FADVISE_HOT_BIT)
|
|
|
|
|
2015-04-21 04:44:41 +08:00
|
|
|
#define file_is_cold(inode) is_file(inode, FADVISE_COLD_BIT)
|
|
|
|
#define file_set_cold(inode) set_file(inode, FADVISE_COLD_BIT)
|
|
|
|
#define file_clear_cold(inode) clear_file(inode, FADVISE_COLD_BIT)
|
2021-03-02 16:35:32 +08:00
|
|
|
|
|
|
|
#define file_wrong_pino(inode) is_file(inode, FADVISE_LOST_PINO_BIT)
|
|
|
|
#define file_lost_pino(inode) set_file(inode, FADVISE_LOST_PINO_BIT)
|
2015-04-21 04:44:41 +08:00
|
|
|
#define file_got_pino(inode) clear_file(inode, FADVISE_LOST_PINO_BIT)
|
2021-03-02 16:35:32 +08:00
|
|
|
|
2015-04-21 04:57:51 +08:00
|
|
|
#define file_is_encrypt(inode) is_file(inode, FADVISE_ENCRYPT_BIT)
|
|
|
|
#define file_set_encrypt(inode) set_file(inode, FADVISE_ENCRYPT_BIT)
|
2021-03-02 16:35:32 +08:00
|
|
|
|
2015-04-30 08:02:18 +08:00
|
|
|
#define file_enc_name(inode) is_file(inode, FADVISE_ENC_NAME_BIT)
|
|
|
|
#define file_set_enc_name(inode) set_file(inode, FADVISE_ENC_NAME_BIT)
|
2021-03-02 16:35:32 +08:00
|
|
|
|
2016-11-29 07:33:38 +08:00
|
|
|
#define file_keep_isize(inode) is_file(inode, FADVISE_KEEP_SIZE_BIT)
|
|
|
|
#define file_set_keep_isize(inode) set_file(inode, FADVISE_KEEP_SIZE_BIT)
|
2021-03-02 16:35:32 +08:00
|
|
|
|
2018-02-28 17:07:27 +08:00
|
|
|
#define file_is_hot(inode) is_file(inode, FADVISE_HOT_BIT)
|
|
|
|
#define file_set_hot(inode) set_file(inode, FADVISE_HOT_BIT)
|
|
|
|
#define file_clear_hot(inode) clear_file(inode, FADVISE_HOT_BIT)
|
2021-03-02 16:35:32 +08:00
|
|
|
|
f2fs: add fs-verity support
Add fs-verity support to f2fs. fs-verity is a filesystem feature that
enables transparent integrity protection and authentication of read-only
files. It uses a dm-verity like mechanism at the file level: a Merkle
tree is used to verify any block in the file in log(filesize) time. It
is implemented mainly by helper functions in fs/verity/. See
Documentation/filesystems/fsverity.rst for the full documentation.
The f2fs support for fs-verity consists of:
- Adding a filesystem feature flag and an inode flag for fs-verity.
- Implementing the fsverity_operations to support enabling verity on an
inode and reading/writing the verity metadata.
- Updating ->readpages() to verify data as it's read from verity files
and to support reading verity metadata pages.
- Updating ->write_begin(), ->write_end(), and ->writepages() to support
writing verity metadata pages.
- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
Like ext4, f2fs stores the verity metadata (Merkle tree and
fsverity_descriptor) past the end of the file, starting at the first 64K
boundary beyond i_size. This approach works because (a) verity files
are readonly, and (b) pages fully beyond i_size aren't visible to
userspace but can be read/written internally by f2fs with only some
relatively small changes to f2fs. Extended attributes cannot be used
because (a) f2fs limits the total size of an inode's xattr entries to
4096 bytes, which wouldn't be enough for even a single Merkle tree
block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity
metadata *must* be encrypted when the file is because it contains hashes
of the plaintext data.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-07-23 00:26:24 +08:00
|
|
|
#define file_is_verity(inode) is_file(inode, FADVISE_VERITY_BIT)
|
|
|
|
#define file_set_verity(inode) set_file(inode, FADVISE_VERITY_BIT)
|
2015-04-21 04:57:51 +08:00
|
|
|
|
2021-11-13 06:31:16 +08:00
|
|
|
#define file_should_truncate(inode) is_file(inode, FADVISE_TRUNC_BIT)
|
|
|
|
#define file_need_truncate(inode) set_file(inode, FADVISE_TRUNC_BIT)
|
|
|
|
#define file_dont_truncate(inode) clear_file(inode, FADVISE_TRUNC_BIT)
|
|
|
|
|
2014-02-27 19:09:05 +08:00
|
|
|
#define DEF_DIR_LEVEL 0
|
|
|
|
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
enum {
|
|
|
|
GC_FAILURE_PIN,
|
|
|
|
GC_FAILURE_ATOMIC,
|
|
|
|
MAX_GC_FAILURE
|
|
|
|
};
|
|
|
|
|
2020-03-23 11:18:07 +08:00
|
|
|
/* used for f2fs_inode_info->flags */
|
|
|
|
enum {
|
|
|
|
FI_NEW_INODE, /* indicate newly allocated inode */
|
|
|
|
FI_DIRTY_INODE, /* indicate inode is dirty or not */
|
|
|
|
FI_AUTO_RECOVER, /* indicate inode is recoverable */
|
|
|
|
FI_DIRTY_DIR, /* indicate directory has dirty pages */
|
|
|
|
FI_INC_LINK, /* need to increment i_nlink */
|
|
|
|
FI_ACL_MODE, /* indicate acl mode */
|
|
|
|
FI_NO_ALLOC, /* should not allocate any blocks */
|
|
|
|
FI_FREE_NID, /* free allocated nide */
|
|
|
|
FI_NO_EXTENT, /* not to use the extent cache */
|
|
|
|
FI_INLINE_XATTR, /* used for inline xattr */
|
|
|
|
FI_INLINE_DATA, /* used for inline data*/
|
|
|
|
FI_INLINE_DENTRY, /* used for inline dentry */
|
|
|
|
FI_APPEND_WRITE, /* inode has appended data */
|
|
|
|
FI_UPDATE_WRITE, /* inode has in-place-update data */
|
|
|
|
FI_NEED_IPU, /* used for ipu per file */
|
|
|
|
FI_ATOMIC_FILE, /* indicate atomic file */
|
|
|
|
FI_ATOMIC_COMMIT, /* indicate the state of atomical committing */
|
|
|
|
FI_VOLATILE_FILE, /* indicate volatile file */
|
|
|
|
FI_FIRST_BLOCK_WRITTEN, /* indicate #0 data block was written */
|
|
|
|
FI_DROP_CACHE, /* drop dirty page cache */
|
|
|
|
FI_DATA_EXIST, /* indicate data exists */
|
|
|
|
FI_INLINE_DOTS, /* indicate inline dot dentries */
|
|
|
|
FI_DO_DEFRAG, /* indicate defragment is running */
|
|
|
|
FI_DIRTY_FILE, /* indicate regular/symlink has dirty pages */
|
2021-07-16 22:39:13 +08:00
|
|
|
FI_PREALLOCATED_ALL, /* all blocks for write were preallocated */
|
2020-03-23 11:18:07 +08:00
|
|
|
FI_HOT_DATA, /* indicate file is hot */
|
|
|
|
FI_EXTRA_ATTR, /* indicate file has extra attribute */
|
|
|
|
FI_PROJ_INHERIT, /* indicate file inherits projectid */
|
|
|
|
FI_PIN_FILE, /* indicate file should not be gced */
|
|
|
|
FI_ATOMIC_REVOKE_REQUEST, /* request to drop atomic data */
|
|
|
|
FI_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */
|
|
|
|
FI_COMPRESSED_FILE, /* indicate file's data can be compressed */
|
2020-11-26 18:32:09 +08:00
|
|
|
FI_COMPRESS_CORRUPT, /* indicate compressed cluster is corrupted */
|
2020-03-23 11:18:07 +08:00
|
|
|
FI_MMAP_FILE, /* indicate file was mmapped */
|
2020-12-01 12:08:02 +08:00
|
|
|
FI_ENABLE_COMPRESS, /* enable compression in "user" compression mode */
|
2021-05-26 02:39:35 +08:00
|
|
|
FI_COMPRESS_RELEASED, /* compressed blocks were released */
|
2021-05-26 14:29:27 +08:00
|
|
|
FI_ALIGNED_WRITE, /* enable aligned write */
|
2020-03-23 11:18:07 +08:00
|
|
|
FI_MAX, /* max flag, never be used */
|
|
|
|
};
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct f2fs_inode_info {
|
|
|
|
struct inode vfs_inode; /* serve a vfs inode */
|
|
|
|
unsigned long i_flags; /* keep an inode flags for ioctl */
|
|
|
|
unsigned char i_advise; /* use to give file attribute hints */
|
2014-02-27 17:20:00 +08:00
|
|
|
unsigned char i_dir_level; /* use for dentry level for large dir */
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
unsigned int i_current_depth; /* only for directory depth */
|
|
|
|
/* for gc failure statistic */
|
|
|
|
unsigned int i_gc_failures[MAX_GC_FAILURE];
|
f2fs: fix tracking parent inode number
Previously, f2fs didn't track the parent inode number correctly which is stored
in each f2fs_inode. In the case of the following scenario, a bug can be occured.
Let's suppose there are one directory, "/b", and two files, "/a" and "/b/a".
- pino of "/a" is ROOT_INO.
- pino of "/b/a" is DIR_B_INO.
Then,
# sync
: The inode pages of "/a" and "/b/a" contain the parent inode numbers as
ROOT_INO and DIR_B_INO respectively.
# mv /a /b/a
: The parent inode number of "/a" should be changed to DIR_B_INO, but f2fs
didn't do that. Ref. f2fs_set_link().
In order to fix this clearly, I added i_pino in f2fs_inode_info, and whenever
it needs to be changed like in f2fs_add_link() and f2fs_set_link(), it is
updated temporarily in f2fs_inode_info.
And later, f2fs_write_inode() stores the latest information to the inode pages.
For power-off-recovery, f2fs_sync_file() triggers simply f2fs_write_inode().
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-10 16:52:48 +08:00
|
|
|
unsigned int i_pino; /* parent inode number */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
umode_t i_acl_mode; /* keep file acl mode temporarily */
|
|
|
|
|
|
|
|
/* Use below internally in f2fs*/
|
2020-03-23 11:18:07 +08:00
|
|
|
unsigned long flags[BITS_TO_LONGS(FI_MAX)]; /* use to pass per-file flags */
|
2014-03-20 18:10:08 +08:00
|
|
|
struct rw_semaphore i_sem; /* protect fi info */
|
2016-12-03 07:11:32 +08:00
|
|
|
atomic_t dirty_pages; /* # of dirty pages */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
f2fs_hash_t chash; /* hash value of given file name */
|
|
|
|
unsigned int clevel; /* maximum level of given file name */
|
2017-02-15 01:54:37 +08:00
|
|
|
struct task_struct *task; /* lookup and create consistency */
|
2017-08-02 23:21:48 +08:00
|
|
|
struct task_struct *cp_task; /* separate cp/wb IO stats*/
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
nid_t i_xattr_nid; /* node id that contains xattrs */
|
2016-05-21 11:42:37 +08:00
|
|
|
loff_t last_disk_size; /* lastly written file size */
|
2020-02-27 19:30:03 +08:00
|
|
|
spinlock_t i_size_lock; /* protect last_disk_size */
|
2014-10-07 08:39:50 +08:00
|
|
|
|
2017-07-09 00:13:07 +08:00
|
|
|
#ifdef CONFIG_QUOTA
|
|
|
|
struct dquot *i_dquot[MAXQUOTAS];
|
|
|
|
|
|
|
|
/* quota space reservation, managed internally by quota code */
|
|
|
|
qsize_t i_reserved_quota;
|
|
|
|
#endif
|
2016-05-21 02:10:10 +08:00
|
|
|
struct list_head dirty_list; /* dirty list for dirs and files */
|
|
|
|
struct list_head gdirty_list; /* linked in global dirty list */
|
2017-10-19 10:05:57 +08:00
|
|
|
struct list_head inmem_ilist; /* list for inmem inodes */
|
2014-10-07 08:39:50 +08:00
|
|
|
struct list_head inmem_pages; /* inmemory pages managed by f2fs */
|
2017-07-25 10:46:29 +08:00
|
|
|
struct task_struct *inmem_task; /* store inmemory task */
|
2014-10-07 08:39:50 +08:00
|
|
|
struct mutex inmem_lock; /* lock for inmemory pages */
|
2015-06-20 08:53:26 +08:00
|
|
|
struct extent_tree *extent_tree; /* cached extent_tree entry */
|
2018-04-24 10:55:28 +08:00
|
|
|
|
|
|
|
/* avoid racing between foreground op and gc */
|
|
|
|
struct rw_semaphore i_gc_rwsem[2];
|
2017-09-07 10:40:54 +08:00
|
|
|
struct rw_semaphore i_xattr_sem; /* avoid racing between reading and changing EAs */
|
2017-07-19 00:19:05 +08:00
|
|
|
|
2017-07-19 00:19:06 +08:00
|
|
|
int i_extra_isize; /* size of extra space located in i_addr */
|
2017-07-26 00:01:41 +08:00
|
|
|
kprojid_t i_projid; /* id for project quota */
|
f2fs: support flexible inline xattr size
Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.
In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.
So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.
To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.
Inode disk layout:
+----------------------+
| .i_mode |
| ... |
| .i_ext |
+----------------------+
| .i_extra_isize |
| .i_inline_xattr_size |-----------+
| ... | |
+----------------------+ |
| .i_addr | |
| - block address or | |
| - inline data | |
+----------------------+<---+ v
| inline xattr | +---inline xattr range
+----------------------+<---+
| .i_nid |
+----------------------+
| node_footer |
| (nid, ino, offset) |
+----------------------+
Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.
Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-06 21:59:50 +08:00
|
|
|
int i_inline_xattr_size; /* inline xattr size */
|
2018-06-20 16:02:19 +08:00
|
|
|
struct timespec64 i_crtime; /* inode creation time */
|
|
|
|
struct timespec64 i_disk_time[4];/* inode disk times */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
|
|
|
|
/* for file compress */
|
2020-09-08 10:44:10 +08:00
|
|
|
atomic_t i_compr_blocks; /* # of compressed blocks */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
unsigned char i_compress_algorithm; /* algorithm type */
|
|
|
|
unsigned char i_log_cluster_size; /* log of cluster size */
|
2021-01-22 17:46:43 +08:00
|
|
|
unsigned char i_compress_level; /* compress level (lz4hc,zstd) */
|
2020-11-26 18:32:09 +08:00
|
|
|
unsigned short i_compress_flag; /* compress flag */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
unsigned int i_cluster_size; /* cluster size */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static inline void get_extent_info(struct extent_info *ext,
|
2016-05-04 23:19:47 +08:00
|
|
|
struct f2fs_extent *i_ext)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2016-05-04 23:19:47 +08:00
|
|
|
ext->fofs = le32_to_cpu(i_ext->fofs);
|
|
|
|
ext->blk = le32_to_cpu(i_ext->blk);
|
|
|
|
ext->len = le32_to_cpu(i_ext->len);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void set_raw_extent(struct extent_info *ext,
|
|
|
|
struct f2fs_extent *i_ext)
|
|
|
|
{
|
|
|
|
i_ext->fofs = cpu_to_le32(ext->fofs);
|
2015-02-05 17:47:25 +08:00
|
|
|
i_ext->blk = cpu_to_le32(ext->blk);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
i_ext->len = cpu_to_le32(ext->len);
|
|
|
|
}
|
|
|
|
|
2015-02-05 17:54:31 +08:00
|
|
|
static inline void set_extent_info(struct extent_info *ei, unsigned int fofs,
|
|
|
|
u32 blk, unsigned int len)
|
|
|
|
{
|
|
|
|
ei->fofs = fofs;
|
|
|
|
ei->blk = blk;
|
|
|
|
ei->len = len;
|
2021-08-04 10:23:48 +08:00
|
|
|
#ifdef CONFIG_F2FS_FS_COMPRESSION
|
|
|
|
ei->c_len = 0;
|
|
|
|
#endif
|
2015-02-05 17:54:31 +08:00
|
|
|
}
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
static inline bool __is_discard_mergeable(struct discard_info *back,
|
2018-08-06 22:43:50 +08:00
|
|
|
struct discard_info *front, unsigned int max_len)
|
2017-04-14 23:24:55 +08:00
|
|
|
{
|
2018-05-25 04:57:26 +08:00
|
|
|
return (back->lstart + back->len == front->lstart) &&
|
2018-08-06 22:43:50 +08:00
|
|
|
(back->len + front->len <= max_len);
|
2017-04-14 23:24:55 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool __is_discard_back_mergeable(struct discard_info *cur,
|
2018-08-06 22:43:50 +08:00
|
|
|
struct discard_info *back, unsigned int max_len)
|
2017-04-14 23:24:55 +08:00
|
|
|
{
|
2018-08-06 22:43:50 +08:00
|
|
|
return __is_discard_mergeable(back, cur, max_len);
|
2017-04-14 23:24:55 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool __is_discard_front_mergeable(struct discard_info *cur,
|
2018-08-06 22:43:50 +08:00
|
|
|
struct discard_info *front, unsigned int max_len)
|
2017-04-14 23:24:55 +08:00
|
|
|
{
|
2018-08-06 22:43:50 +08:00
|
|
|
return __is_discard_mergeable(cur, front, max_len);
|
2017-04-14 23:24:55 +08:00
|
|
|
}
|
|
|
|
|
2015-02-05 17:54:31 +08:00
|
|
|
static inline bool __is_extent_mergeable(struct extent_info *back,
|
|
|
|
struct extent_info *front)
|
|
|
|
{
|
2021-08-04 10:23:48 +08:00
|
|
|
#ifdef CONFIG_F2FS_FS_COMPRESSION
|
|
|
|
if (back->c_len && back->len != back->c_len)
|
|
|
|
return false;
|
|
|
|
if (front->c_len && front->len != front->c_len)
|
|
|
|
return false;
|
|
|
|
#endif
|
2015-02-05 17:54:31 +08:00
|
|
|
return (back->fofs + back->len == front->fofs &&
|
|
|
|
back->blk + back->len == front->blk);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool __is_back_mergeable(struct extent_info *cur,
|
|
|
|
struct extent_info *back)
|
|
|
|
{
|
|
|
|
return __is_extent_mergeable(back, cur);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool __is_front_mergeable(struct extent_info *cur,
|
|
|
|
struct extent_info *front)
|
|
|
|
{
|
|
|
|
return __is_extent_mergeable(cur, front);
|
|
|
|
}
|
|
|
|
|
2017-01-31 02:55:18 +08:00
|
|
|
extern void f2fs_mark_inode_dirty_sync(struct inode *inode, bool sync);
|
2018-09-10 16:18:25 +08:00
|
|
|
static inline void __try_update_largest_extent(struct extent_tree *et,
|
|
|
|
struct extent_node *en)
|
2015-09-22 21:07:47 +08:00
|
|
|
{
|
2016-05-21 00:52:20 +08:00
|
|
|
if (en->ei.len > et->largest.len) {
|
2015-09-22 21:07:47 +08:00
|
|
|
et->largest = en->ei;
|
2018-09-10 16:18:25 +08:00
|
|
|
et->largest_updated = true;
|
2016-05-21 00:52:20 +08:00
|
|
|
}
|
2015-09-22 21:07:47 +08:00
|
|
|
}
|
|
|
|
|
2017-09-29 13:59:35 +08:00
|
|
|
/*
|
|
|
|
* For free nid management
|
|
|
|
*/
|
|
|
|
enum nid_state {
|
|
|
|
FREE_NID, /* newly added to free nid list */
|
|
|
|
PREALLOC_NID, /* it is preallocated */
|
|
|
|
MAX_NID_STATE,
|
f2fs: split free nid list
During free nid allocation, in order to do preallocation, we will tag free
nid entry as allocated one and still leave it in free nid list, for other
allocators who want to grab free nids, it needs to traverse the free nid
list for lookup. It becomes overhead in scenario of allocating free nid
intensively by multithreads.
This patch splits free nid list to two list: {free,alloc}_nid_list, to
keep free nids and preallocated free nids separately, after that, traverse
latency will be gone, besides split nid_cnt for separate statistic.
Additionally, introduce __insert_nid_to_list and __remove_nid_from_list for
cleanup.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: modify f2fs_bug_on to avoid needless branches]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-12 19:28:29 +08:00
|
|
|
};
|
|
|
|
|
2020-11-07 05:22:05 +08:00
|
|
|
enum nat_state {
|
|
|
|
TOTAL_NAT,
|
|
|
|
DIRTY_NAT,
|
|
|
|
RECLAIMABLE_NAT,
|
|
|
|
MAX_NAT_STATE,
|
|
|
|
};
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct f2fs_nm_info {
|
|
|
|
block_t nat_blkaddr; /* base disk address of NAT */
|
|
|
|
nid_t max_nid; /* maximum possible node ids */
|
2016-11-17 20:53:11 +08:00
|
|
|
nid_t available_nids; /* # of available node ids */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
nid_t next_scan_nid; /* the next nid to be scanned */
|
2014-03-19 12:31:37 +08:00
|
|
|
unsigned int ram_thresh; /* control the memory footprint */
|
2015-10-12 17:08:48 +08:00
|
|
|
unsigned int ra_nid_pages; /* # of nid pages to be readaheaded */
|
2016-01-18 18:32:58 +08:00
|
|
|
unsigned int dirty_nats_ratio; /* control dirty nats ratio threshold */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
/* NAT cache management */
|
|
|
|
struct radix_tree_root nat_root;/* root of the nat entry cache */
|
2014-09-23 02:40:48 +08:00
|
|
|
struct radix_tree_root nat_set_root;/* root of the nat set cache */
|
2021-03-23 19:41:30 +08:00
|
|
|
struct rw_semaphore nat_tree_lock; /* protect nat entry tree */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct list_head nat_entries; /* cached nat entry list (clean) */
|
2018-08-05 23:08:59 +08:00
|
|
|
spinlock_t nat_list_lock; /* protect clean nat entry list */
|
2020-11-07 05:22:05 +08:00
|
|
|
unsigned int nat_cnt[MAX_NAT_STATE]; /* the # of cached nat entries */
|
2017-02-10 02:38:09 +08:00
|
|
|
unsigned int nat_blocks; /* # of nat blocks */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
/* free node ids management */
|
2014-02-21 13:29:35 +08:00
|
|
|
struct radix_tree_root free_nid_root;/* root of the free_nid cache */
|
2017-09-29 13:59:35 +08:00
|
|
|
struct list_head free_nid_list; /* list for free nids excluding preallocated nids */
|
|
|
|
unsigned int nid_cnt[MAX_NID_STATE]; /* the number of free node id */
|
f2fs: split free nid list
During free nid allocation, in order to do preallocation, we will tag free
nid entry as allocated one and still leave it in free nid list, for other
allocators who want to grab free nids, it needs to traverse the free nid
list for lookup. It becomes overhead in scenario of allocating free nid
intensively by multithreads.
This patch splits free nid list to two list: {free,alloc}_nid_list, to
keep free nids and preallocated free nids separately, after that, traverse
latency will be gone, besides split nid_cnt for separate statistic.
Additionally, introduce __insert_nid_to_list and __remove_nid_from_list for
cleanup.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: modify f2fs_bug_on to avoid needless branches]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-12 19:28:29 +08:00
|
|
|
spinlock_t nid_list_lock; /* protect nid lists ops */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct mutex build_lock; /* lock for build free nids */
|
2018-03-10 09:42:28 +08:00
|
|
|
unsigned char **free_nid_bitmap;
|
f2fs: introduce free nid bitmap
In scenario of intensively node allocation, free nids will be ran out
soon, then it needs to stop to load free nids by traversing NAT blocks,
in worse case, if NAT blocks does not be cached in memory, it generates
IOs which slows down our foreground operations.
In order to speed up node allocation, in this patch we introduce a new
free_nid_bitmap array, so there is an bitmap table for each NAT block,
Once the NAT block is loaded, related bitmap cache will be switched on,
and bitmap will be set during traversing nat entries in NAT block, later
we can query and update nid usage status in memory completely.
With such implementation, I expect performance of node allocation can be
improved in the long-term after filesystem image is mounted.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:53:49 +08:00
|
|
|
unsigned char *nat_block_bitmap;
|
2017-03-01 17:09:07 +08:00
|
|
|
unsigned short *free_nid_count; /* free nid count of NAT block */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
/* for checkpoint */
|
|
|
|
char *nat_bitmap; /* NAT bitmap pointer */
|
2017-02-10 02:38:09 +08:00
|
|
|
|
|
|
|
unsigned int nat_bits_blocks; /* # of nat bits blocks */
|
|
|
|
unsigned char *nat_bits; /* NAT bits blocks */
|
|
|
|
unsigned char *full_nat_bits; /* full NAT pages */
|
|
|
|
unsigned char *empty_nat_bits; /* empty NAT pages */
|
2017-01-07 18:52:01 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
char *nat_bitmap_mir; /* NAT bitmap mirror */
|
|
|
|
#endif
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
int bitmap_size; /* bitmap size */
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* this structure is used as one of function parameters.
|
|
|
|
* all the information are dedicated to a given direct node block determined
|
|
|
|
* by the data offset in a file.
|
|
|
|
*/
|
|
|
|
struct dnode_of_data {
|
|
|
|
struct inode *inode; /* vfs inode pointer */
|
|
|
|
struct page *inode_page; /* its inode page, NULL is possible */
|
|
|
|
struct page *node_page; /* cached direct node page */
|
|
|
|
nid_t nid; /* node id of the direct node block */
|
|
|
|
unsigned int ofs_in_node; /* data offset in the node page */
|
|
|
|
bool inode_page_locked; /* inode page is locked or not */
|
2015-12-23 04:59:54 +08:00
|
|
|
bool node_changed; /* is node block changed */
|
2016-01-26 15:40:44 +08:00
|
|
|
char cur_level; /* level of hole node page */
|
|
|
|
char max_level; /* level of current page located */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
block_t data_blkaddr; /* block address of the node block */
|
|
|
|
};
|
|
|
|
|
|
|
|
static inline void set_new_dnode(struct dnode_of_data *dn, struct inode *inode,
|
|
|
|
struct page *ipage, struct page *npage, nid_t nid)
|
|
|
|
{
|
2013-01-03 07:57:21 +08:00
|
|
|
memset(dn, 0, sizeof(*dn));
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
dn->inode = inode;
|
|
|
|
dn->inode_page = ipage;
|
|
|
|
dn->node_page = npage;
|
|
|
|
dn->nid = nid;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For SIT manager
|
|
|
|
*
|
|
|
|
* By default, there are 6 active log areas across the whole main area.
|
|
|
|
* When considering hot and cold data separation to reduce cleaning overhead,
|
|
|
|
* we split 3 for data logs and 3 for node logs as hot, warm, and cold types,
|
|
|
|
* respectively.
|
|
|
|
* In the current design, you should not change the numbers intentionally.
|
|
|
|
* Instead, as a mount option such as active_logs=x, you can use 2, 4, and 6
|
|
|
|
* logs individually according to the underlying devices. (default: 6)
|
|
|
|
* Just in case, on-disk layout covers maximum 16 logs that consist of 8 for
|
|
|
|
* data and 8 for node logs.
|
|
|
|
*/
|
|
|
|
#define NR_CURSEG_DATA_TYPE (3)
|
|
|
|
#define NR_CURSEG_NODE_TYPE (3)
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
#define NR_CURSEG_INMEM_TYPE (2)
|
2021-05-21 16:32:53 +08:00
|
|
|
#define NR_CURSEG_RO_TYPE (2)
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
#define NR_CURSEG_PERSIST_TYPE (NR_CURSEG_DATA_TYPE + NR_CURSEG_NODE_TYPE)
|
|
|
|
#define NR_CURSEG_TYPE (NR_CURSEG_INMEM_TYPE + NR_CURSEG_PERSIST_TYPE)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
enum {
|
|
|
|
CURSEG_HOT_DATA = 0, /* directory entry blocks */
|
|
|
|
CURSEG_WARM_DATA, /* data blocks */
|
|
|
|
CURSEG_COLD_DATA, /* multimedia or GCed data blocks */
|
|
|
|
CURSEG_HOT_NODE, /* direct node blocks of directory files */
|
|
|
|
CURSEG_WARM_NODE, /* direct node blocks of normal files */
|
|
|
|
CURSEG_COLD_NODE, /* indirect node blocks */
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
NR_PERSISTENT_LOG, /* number of persistent log */
|
|
|
|
CURSEG_COLD_DATA_PINNED = NR_PERSISTENT_LOG,
|
|
|
|
/* pinned file that needs consecutive block address */
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
CURSEG_ALL_DATA_ATGC, /* SSR alloctor in hot/warm/cold data area */
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
NO_CHECK_TYPE, /* number of persistent & inmem log */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
};
|
|
|
|
|
2014-04-02 14:34:36 +08:00
|
|
|
struct flush_cmd {
|
|
|
|
struct completion wait;
|
2014-09-05 18:31:00 +08:00
|
|
|
struct llist_node llnode;
|
2017-09-29 13:59:38 +08:00
|
|
|
nid_t ino;
|
2014-04-02 14:34:36 +08:00
|
|
|
int ret;
|
|
|
|
};
|
|
|
|
|
2014-04-27 14:21:21 +08:00
|
|
|
struct flush_cmd_control {
|
|
|
|
struct task_struct *f2fs_issue_flush; /* flush thread */
|
|
|
|
wait_queue_head_t flush_wait_queue; /* waiting queue for wake-up */
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_t issued_flush; /* # of issued flushes */
|
2018-12-14 08:53:57 +08:00
|
|
|
atomic_t queued_flush; /* # of queued flushes */
|
2014-09-05 18:31:00 +08:00
|
|
|
struct llist_head issue_list; /* list for command issue */
|
|
|
|
struct llist_node *dispatch_list; /* list for command dispatch */
|
2014-04-27 14:21:21 +08:00
|
|
|
};
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct f2fs_sm_info {
|
|
|
|
struct sit_info *sit_info; /* whole segment information */
|
|
|
|
struct free_segmap_info *free_info; /* free segment information */
|
|
|
|
struct dirty_seglist_info *dirty_info; /* dirty segment information */
|
|
|
|
struct curseg_info *curseg_array; /* active segment information */
|
|
|
|
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
struct rw_semaphore curseg_lock; /* for preventing curseg change */
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
block_t seg0_blkaddr; /* block address of 0'th segment */
|
|
|
|
block_t main_blkaddr; /* start block address of main area */
|
|
|
|
block_t ssa_blkaddr; /* start block address of SSA area */
|
|
|
|
|
|
|
|
unsigned int segment_count; /* total # of segments */
|
|
|
|
unsigned int main_segments; /* # of segments in main area */
|
|
|
|
unsigned int reserved_segments; /* # of reserved segments */
|
f2fs: fix to reserve space for IO align feature
https://bugzilla.kernel.org/show_bug.cgi?id=204137
With below script, we will hit panic during new segment allocation:
DISK=bingo.img
MOUNT_DIR=/mnt/f2fs
dd if=/dev/zero of=$DISK bs=1M count=105
mkfs.f2fe -a 1 -o 19 -t 1 -z 1 -f -q $DISK
mount -t f2fs $DISK $MOUNT_DIR -o "noinline_dentry,flush_merge,noextent_cache,mode=lfs,io_bits=7,fsync_mode=strict"
for (( i = 0; i < 4096; i++ )); do
name=`head /dev/urandom | tr -dc A-Za-z0-9 | head -c 10`
mkdir $MOUNT_DIR/$name
done
umount $MOUNT_DIR
rm $DISK
--- Core dump ---
Call Trace:
allocate_segment_by_default+0x9d/0x100 [f2fs]
f2fs_allocate_data_block+0x3c0/0x5c0 [f2fs]
do_write_page+0x62/0x110 [f2fs]
f2fs_outplace_write_data+0x43/0xc0 [f2fs]
f2fs_do_write_data_page+0x386/0x560 [f2fs]
__write_data_page+0x706/0x850 [f2fs]
f2fs_write_cache_pages+0x267/0x6a0 [f2fs]
f2fs_write_data_pages+0x19c/0x2e0 [f2fs]
do_writepages+0x1c/0x70
__filemap_fdatawrite_range+0xaa/0xe0
filemap_fdatawrite+0x1f/0x30
f2fs_sync_dirty_inodes+0x74/0x1f0 [f2fs]
block_operations+0xdc/0x350 [f2fs]
f2fs_write_checkpoint+0x104/0x1150 [f2fs]
f2fs_sync_fs+0xa2/0x120 [f2fs]
f2fs_balance_fs_bg+0x33c/0x390 [f2fs]
f2fs_write_node_pages+0x4c/0x1f0 [f2fs]
do_writepages+0x1c/0x70
__writeback_single_inode+0x45/0x320
writeback_sb_inodes+0x273/0x5c0
wb_writeback+0xff/0x2e0
wb_workfn+0xa1/0x370
process_one_work+0x138/0x350
worker_thread+0x4d/0x3d0
kthread+0x109/0x140
ret_from_fork+0x25/0x30
The root cause here is, with IO alignment feature enables, in worst
case, we need F2FS_IO_SIZE() free blocks space for single one 4k write
due to IO alignment feature will fill dummy pages to make IO being
aligned.
So we will easily run out of free segments during non-inline directory's
data writeback, even in process of foreground GC.
In order to fix this issue, I just propose to reserve additional free
space for IO alignment feature to handle worst case of free space usage
ratio during FGGC.
Fixes: 0a595ebaaa6b ("f2fs: support IO alignment for DATA and NODE writes")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-12-11 21:27:36 +08:00
|
|
|
unsigned int additional_reserved_segments;/* reserved segs for IO align feature */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
unsigned int ovp_segments; /* # of overprovision segments */
|
2013-10-24 12:31:34 +08:00
|
|
|
|
|
|
|
/* a threshold to reclaim prefree segments */
|
|
|
|
unsigned int rec_prefree_segments;
|
2013-11-15 12:55:58 +08:00
|
|
|
|
2015-01-27 09:41:23 +08:00
|
|
|
/* for batched trimming */
|
|
|
|
unsigned int trim_sections; /* # of sections to trim */
|
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
struct list_head sit_entry_set; /* sit entry set list */
|
|
|
|
|
2013-11-07 12:13:42 +08:00
|
|
|
unsigned int ipu_policy; /* in-place-update policy */
|
|
|
|
unsigned int min_ipu_util; /* in-place-update threshold */
|
2014-09-11 07:53:02 +08:00
|
|
|
unsigned int min_fsync_blocks; /* threshold for fsync */
|
2018-08-10 08:53:34 +08:00
|
|
|
unsigned int min_seq_blocks; /* threshold for sequential blocks */
|
2017-03-25 08:05:13 +08:00
|
|
|
unsigned int min_hot_blocks; /* threshold for hot block allocation */
|
2017-10-28 16:52:33 +08:00
|
|
|
unsigned int min_ssr_sections; /* threshold to trigger SSR allocation */
|
2014-04-02 14:34:36 +08:00
|
|
|
|
|
|
|
/* for flush command control */
|
2017-01-10 06:13:03 +08:00
|
|
|
struct flush_cmd_control *fcc_info;
|
2014-04-27 14:21:21 +08:00
|
|
|
|
2017-01-12 06:40:24 +08:00
|
|
|
/* for discard command control */
|
|
|
|
struct discard_cmd_control *dcc_info;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For superblock
|
|
|
|
*/
|
|
|
|
/*
|
|
|
|
* COUNT_TYPE for monitoring
|
|
|
|
*
|
|
|
|
* f2fs monitors the number of several block types such as on-writeback,
|
|
|
|
* dirty dentry blocks, dirty node blocks, and dirty meta blocks.
|
|
|
|
*/
|
f2fs: don't wait writeback for datas during checkpoint
Normally, while committing checkpoint, we will wait on all pages to be
writebacked no matter the page is data or metadata, so in scenario where
there are lots of data IO being submitted with metadata, we may suffer
long latency for waiting writeback during checkpoint.
Indeed, we only care about persistence for pages with metadata, but not
pages with data, as file system consistent are only related to metadate,
so in order to avoid encountering long latency in above scenario, let's
recognize and reference metadata in submitted IOs, wait writeback only
for metadatas.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-16 10:41:20 +08:00
|
|
|
#define WB_DATA_TYPE(p) (__is_cp_guaranteed(p) ? F2FS_WB_CP_DATA : F2FS_WB_DATA)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
enum count_type {
|
|
|
|
F2FS_DIRTY_DENTS,
|
2015-12-16 13:09:20 +08:00
|
|
|
F2FS_DIRTY_DATA,
|
2017-11-14 09:46:38 +08:00
|
|
|
F2FS_DIRTY_QDATA,
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
F2FS_DIRTY_NODES,
|
|
|
|
F2FS_DIRTY_META,
|
2014-12-06 09:18:15 +08:00
|
|
|
F2FS_INMEM_PAGES,
|
2016-05-21 02:10:10 +08:00
|
|
|
F2FS_DIRTY_IMETA,
|
f2fs: don't wait writeback for datas during checkpoint
Normally, while committing checkpoint, we will wait on all pages to be
writebacked no matter the page is data or metadata, so in scenario where
there are lots of data IO being submitted with metadata, we may suffer
long latency for waiting writeback during checkpoint.
Indeed, we only care about persistence for pages with metadata, but not
pages with data, as file system consistent are only related to metadate,
so in order to avoid encountering long latency in above scenario, let's
recognize and reference metadata in submitted IOs, wait writeback only
for metadatas.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-16 10:41:20 +08:00
|
|
|
F2FS_WB_CP_DATA,
|
|
|
|
F2FS_WB_DATA,
|
2018-10-17 01:20:53 +08:00
|
|
|
F2FS_RD_DATA,
|
|
|
|
F2FS_RD_NODE,
|
|
|
|
F2FS_RD_META,
|
2018-11-12 00:46:46 +08:00
|
|
|
F2FS_DIO_WRITE,
|
|
|
|
F2FS_DIO_READ,
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
NR_COUNT_TYPE,
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
2014-08-06 22:22:50 +08:00
|
|
|
* The below are the page types of bios used in submit_bio().
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
* The available types are:
|
|
|
|
* DATA User data pages. It operates as async mode.
|
|
|
|
* NODE Node pages. It operates as async mode.
|
|
|
|
* META FS metadata pages such as SIT, NAT, CP.
|
|
|
|
* NR_PAGE_TYPE The number of page types.
|
|
|
|
* META_FLUSH Make sure the previous pages are written
|
|
|
|
* with waiting the bio's completion
|
|
|
|
* ... Only can be used with META.
|
|
|
|
*/
|
2013-11-18 16:13:35 +08:00
|
|
|
#define PAGE_TYPE_OF_BIO(type) ((type) > META ? META : (type))
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
enum page_type {
|
|
|
|
DATA,
|
|
|
|
NODE,
|
|
|
|
META,
|
|
|
|
NR_PAGE_TYPE,
|
|
|
|
META_FLUSH,
|
2015-03-18 08:58:08 +08:00
|
|
|
INMEM, /* the below types are used by tracepoints only. */
|
|
|
|
INMEM_DROP,
|
2017-03-17 09:55:52 +08:00
|
|
|
INMEM_INVALIDATE,
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
INMEM_REVOKE,
|
2015-03-18 08:58:08 +08:00
|
|
|
IPU,
|
|
|
|
OPU,
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
};
|
|
|
|
|
2017-05-11 02:18:25 +08:00
|
|
|
enum temp_type {
|
|
|
|
HOT = 0, /* must be zero for meta bio */
|
|
|
|
WARM,
|
|
|
|
COLD,
|
|
|
|
NR_TEMP_TYPE,
|
|
|
|
};
|
|
|
|
|
2017-05-13 04:51:34 +08:00
|
|
|
enum need_lock_type {
|
|
|
|
LOCK_REQ = 0,
|
|
|
|
LOCK_DONE,
|
|
|
|
LOCK_RETRY,
|
|
|
|
};
|
|
|
|
|
2017-11-06 22:51:45 +08:00
|
|
|
enum cp_reason_type {
|
|
|
|
CP_NO_NEEDED,
|
|
|
|
CP_NON_REGULAR,
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
CP_COMPRESSED,
|
2017-11-06 22:51:45 +08:00
|
|
|
CP_HARDLINK,
|
|
|
|
CP_SB_NEED_CP,
|
|
|
|
CP_WRONG_PINO,
|
|
|
|
CP_NO_SPC_ROLL,
|
|
|
|
CP_NODE_NEED_CP,
|
|
|
|
CP_FASTBOOT_MODE,
|
|
|
|
CP_SPEC_LOG_NUM,
|
2017-12-29 00:09:44 +08:00
|
|
|
CP_RECOVER_DIR,
|
2017-11-06 22:51:45 +08:00
|
|
|
};
|
|
|
|
|
2017-08-02 23:21:48 +08:00
|
|
|
enum iostat_type {
|
2020-04-16 18:16:56 +08:00
|
|
|
/* WRITE IO */
|
|
|
|
APP_DIRECT_IO, /* app direct write IOs */
|
|
|
|
APP_BUFFERED_IO, /* app buffered write IOs */
|
2017-08-02 23:21:48 +08:00
|
|
|
APP_WRITE_IO, /* app write IOs */
|
|
|
|
APP_MAPPED_IO, /* app mapped IOs */
|
|
|
|
FS_DATA_IO, /* data IOs from kworker/fsync/reclaimer */
|
|
|
|
FS_NODE_IO, /* node IOs from kworker/fsync/reclaimer */
|
|
|
|
FS_META_IO, /* meta IOs from kworker/reclaimer */
|
|
|
|
FS_GC_DATA_IO, /* data IOs from forground gc */
|
|
|
|
FS_GC_NODE_IO, /* node IOs from forground gc */
|
|
|
|
FS_CP_DATA_IO, /* data IOs from checkpoint */
|
|
|
|
FS_CP_NODE_IO, /* node IOs from checkpoint */
|
|
|
|
FS_CP_META_IO, /* meta IOs from checkpoint */
|
2020-04-16 18:16:56 +08:00
|
|
|
|
|
|
|
/* READ IO */
|
|
|
|
APP_DIRECT_READ_IO, /* app direct read IOs */
|
|
|
|
APP_BUFFERED_READ_IO, /* app buffered read IOs */
|
|
|
|
APP_READ_IO, /* app read IOs */
|
|
|
|
APP_MAPPED_READ_IO, /* app mapped read IOs */
|
|
|
|
FS_DATA_READ_IO, /* data read IOs */
|
2020-04-23 18:03:06 +08:00
|
|
|
FS_GDATA_READ_IO, /* data read IOs from background gc */
|
|
|
|
FS_CDATA_READ_IO, /* compressed data read IOs */
|
2020-04-16 18:16:56 +08:00
|
|
|
FS_NODE_READ_IO, /* node read IOs */
|
|
|
|
FS_META_READ_IO, /* meta read IOs */
|
|
|
|
|
|
|
|
/* other */
|
2017-08-02 23:21:48 +08:00
|
|
|
FS_DISCARD, /* discard */
|
|
|
|
NR_IO_TYPE,
|
|
|
|
};
|
|
|
|
|
2013-12-11 12:54:01 +08:00
|
|
|
struct f2fs_io_info {
|
2015-04-24 05:38:15 +08:00
|
|
|
struct f2fs_sb_info *sbi; /* f2fs_sb_info pointer */
|
2017-09-29 13:59:38 +08:00
|
|
|
nid_t ino; /* inode number */
|
2013-12-20 18:17:49 +08:00
|
|
|
enum page_type type; /* contains DATA/NODE/META/META_FLUSH */
|
2017-05-11 02:18:25 +08:00
|
|
|
enum temp_type temp; /* contains HOT/WARM/COLD */
|
2016-06-06 03:31:55 +08:00
|
|
|
int op; /* contains REQ_OP_ */
|
2016-10-28 22:48:16 +08:00
|
|
|
int op_flags; /* req_flag_bits */
|
f2fs: trace old block address for CoWed page
This patch enables to trace old block address of CoWed page for better
debugging.
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 18:36:38 +08:00
|
|
|
block_t new_blkaddr; /* new block address to be written */
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
block_t old_blkaddr; /* old block address before Cow */
|
2015-04-24 05:38:15 +08:00
|
|
|
struct page *page; /* page to be written */
|
2015-04-24 03:04:33 +08:00
|
|
|
struct page *encrypted_page; /* encrypted page */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
struct page *compressed_page; /* compressed page */
|
2017-05-19 23:37:01 +08:00
|
|
|
struct list_head list; /* serialize IOs */
|
2017-02-04 09:44:04 +08:00
|
|
|
bool submitted; /* indicate IO submission */
|
2017-05-13 04:51:34 +08:00
|
|
|
int need_lock; /* indicate we need to lock cp_rwsem */
|
2017-05-19 23:37:01 +08:00
|
|
|
bool in_list; /* indicate fio is in io_list */
|
2019-04-15 15:26:31 +08:00
|
|
|
bool is_por; /* indicate IO is from recovery or not */
|
2018-05-28 23:47:18 +08:00
|
|
|
bool retry; /* need to reallocate block address */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
int compr_blocks; /* # of compressed block addresses */
|
|
|
|
bool encrypted; /* indicate file is encrypted */
|
2017-08-02 23:21:48 +08:00
|
|
|
enum iostat_type io_type; /* io type */
|
2018-01-09 19:33:39 +08:00
|
|
|
struct writeback_control *io_wbc; /* writeback control */
|
f2fs: add bio cache for IPU
SQLite in Wal mode may trigger sequential IPU write in db-wal file, after
commit d1b3e72d5490 ("f2fs: submit bio of in-place-update pages"), we
lost the chance of merging page in inner managed bio cache, result in
submitting more small-sized IO.
So let's add temporary bio in writepages() to cache mergeable write IO as
much as possible.
Test case:
1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
2. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
Before:
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65552, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65560, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65568, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65576, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65584, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65592, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65600, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65608, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65616, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65624, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65632, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65640, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65648, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65656, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65664, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57352, size = 4096
After:
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 65536
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57368, size = 4096
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-19 16:15:29 +08:00
|
|
|
struct bio **bio; /* bio for ipu */
|
|
|
|
sector_t *last_block; /* last block number in bio */
|
2018-07-17 00:02:17 +08:00
|
|
|
unsigned char version; /* version of the node */
|
2013-12-11 12:54:01 +08:00
|
|
|
};
|
|
|
|
|
2019-09-30 18:53:25 +08:00
|
|
|
struct bio_entry {
|
|
|
|
struct bio *bio;
|
|
|
|
struct list_head list;
|
|
|
|
};
|
|
|
|
|
2017-04-09 07:11:36 +08:00
|
|
|
#define is_read_io(rw) ((rw) == READ)
|
2013-11-19 11:47:22 +08:00
|
|
|
struct f2fs_bio_info {
|
2013-12-11 12:54:01 +08:00
|
|
|
struct f2fs_sb_info *sbi; /* f2fs superblock */
|
2013-11-19 11:47:22 +08:00
|
|
|
struct bio *bio; /* bios to merge */
|
|
|
|
sector_t last_block_in_bio; /* last block number */
|
2013-12-11 12:54:01 +08:00
|
|
|
struct f2fs_io_info fio; /* store buffered io info. */
|
2014-03-22 14:57:23 +08:00
|
|
|
struct rw_semaphore io_rwsem; /* blocking op for bio */
|
2017-05-19 23:37:01 +08:00
|
|
|
spinlock_t io_lock; /* serialize DATA/NODE IOs */
|
|
|
|
struct list_head io_list; /* track fios */
|
2019-09-30 18:53:25 +08:00
|
|
|
struct list_head bio_list; /* bio entry list head */
|
|
|
|
struct rw_semaphore bio_list_lock; /* lock to protect bio entry list */
|
2013-11-19 11:47:22 +08:00
|
|
|
};
|
|
|
|
|
2016-10-07 10:02:05 +08:00
|
|
|
#define FDEV(i) (sbi->devs[i])
|
|
|
|
#define RDEV(i) (raw_super->devs[i])
|
|
|
|
struct f2fs_dev_info {
|
|
|
|
struct block_device *bdev;
|
|
|
|
char path[MAX_PATH_LEN];
|
|
|
|
unsigned int total_segments;
|
|
|
|
block_t start_blk;
|
|
|
|
block_t end_blk;
|
|
|
|
#ifdef CONFIG_BLK_DEV_ZONED
|
2019-03-16 08:13:07 +08:00
|
|
|
unsigned int nr_blkz; /* Total number of zones */
|
|
|
|
unsigned long *blkz_seq; /* Bitmap indicating sequential zones */
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
block_t *zone_capacity_blocks; /* Array of zone capacity in blks */
|
2016-10-07 10:02:05 +08:00
|
|
|
#endif
|
|
|
|
};
|
|
|
|
|
2015-12-16 13:09:20 +08:00
|
|
|
enum inode_type {
|
|
|
|
DIR_INODE, /* for dirty dir inode */
|
|
|
|
FILE_INODE, /* for dirty regular/symlink inode */
|
2016-05-21 02:10:10 +08:00
|
|
|
DIRTY_META, /* for all dirtied inode metadata */
|
2017-10-19 10:05:57 +08:00
|
|
|
ATOMIC_FILE, /* for all atomic files */
|
2015-12-16 13:09:20 +08:00
|
|
|
NR_INODE_TYPE,
|
|
|
|
};
|
|
|
|
|
2014-11-18 11:18:36 +08:00
|
|
|
/* for inner inode cache management */
|
|
|
|
struct inode_management {
|
|
|
|
struct radix_tree_root ino_root; /* ino entry array */
|
|
|
|
spinlock_t ino_lock; /* for ino entry lock */
|
|
|
|
struct list_head ino_list; /* inode list head */
|
|
|
|
unsigned long ino_num; /* number of entries */
|
|
|
|
};
|
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
/* for GC_AT */
|
|
|
|
struct atgc_management {
|
|
|
|
bool atgc_enabled; /* ATGC is enabled or not */
|
|
|
|
struct rb_root_cached root; /* root of victim rb-tree */
|
|
|
|
struct list_head victim_list; /* linked with all victim entries */
|
|
|
|
unsigned int victim_count; /* victim count in rb-tree */
|
|
|
|
unsigned int candidate_ratio; /* candidate ratio */
|
|
|
|
unsigned int max_candidate_count; /* max candidate count */
|
|
|
|
unsigned int age_weight; /* age weight, vblock_weight = 100 - age_weight */
|
|
|
|
unsigned long long age_threshold; /* age threshold */
|
|
|
|
};
|
|
|
|
|
2015-01-28 17:48:42 +08:00
|
|
|
/* For s_flag in struct f2fs_sb_info */
|
|
|
|
enum {
|
|
|
|
SBI_IS_DIRTY, /* dirty flag for checkpoint */
|
|
|
|
SBI_IS_CLOSE, /* specify unmounting */
|
|
|
|
SBI_NEED_FSCK, /* need fsck.f2fs to fix */
|
|
|
|
SBI_POR_DOING, /* recovery is doing or not */
|
2016-03-24 08:05:27 +08:00
|
|
|
SBI_NEED_SB_WRITE, /* need to recover superblock */
|
2016-08-30 09:23:45 +08:00
|
|
|
SBI_NEED_CP, /* need to checkpoint */
|
2018-06-22 04:46:23 +08:00
|
|
|
SBI_IS_SHUTDOWN, /* shutdown by ioctl */
|
f2fs: fix to flush all dirty inodes recovered in readonly fs
generic/417 reported as blow:
------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:695!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 21697 Comm: umount Tainted: G W O 4.18.0-rc2+ #39
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
Call Trace:
? _raw_spin_unlock+0x2c/0x50
evict+0xa8/0x170
dispose_list+0x34/0x40
evict_inodes+0x118/0x120
generic_shutdown_super+0x41/0x100
? rcu_read_lock_sched_held+0x97/0xa0
kill_block_super+0x22/0x50
kill_f2fs_super+0x6f/0x80 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x81/0xa0
exit_to_usermode_loop+0x59/0xa7
do_fast_syscall_32+0x1f5/0x22c
entry_SYSENTER_32+0x53/0x86
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
It can simply reproduced with scripts:
Enable quota feature during mkfs.
Testcase1:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4k" -c "fsync"
4. godown /mnt/f2fs
5. umount /mnt/f2fs
6. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
7. umount /mnt/f2fs
Testcase2:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. touch /mnt/f2fs/file
4. create process[pid = x] do:
a) open /mnt/f2fs/file;
b) unlink /mnt/f2fs/file
5. godown -f /mnt/f2fs
6. kill process[pid = x]
7. umount /mnt/f2fs
8. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
9. umount /mnt/f2fs
The reason is: during recovery, i_{c,m}time of inode will be updated, then
the inode can be set dirty w/o being tracked in sbi->inode_list[DIRTY_META]
global list, so later write_checkpoint will not flush such dirty inode into
node page.
Once umount is called, sync_filesystem() in generic_shutdown_super() will
skip syncng dirty inodes due to sb_rdonly check, leaving dirty inodes
there.
To solve this issue, during umount, add remove SB_RDONLY flag in
sb->s_flags, to make sure sync_filesystem() will not be skipped.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-22 17:11:05 +08:00
|
|
|
SBI_IS_RECOVERED, /* recovered orphan/data */
|
2018-08-21 10:21:43 +08:00
|
|
|
SBI_CP_DISABLED, /* CP was disabled last mount */
|
2019-01-25 09:48:38 +08:00
|
|
|
SBI_CP_DISABLED_QUICK, /* CP was disabled quickly */
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 20:05:00 +08:00
|
|
|
SBI_QUOTA_NEED_FLUSH, /* need to flush quota info in CP */
|
|
|
|
SBI_QUOTA_SKIP_FLUSH, /* skip flushing quota in current CP */
|
|
|
|
SBI_QUOTA_NEED_REPAIR, /* quota file may be corrupted */
|
2019-06-05 11:33:25 +08:00
|
|
|
SBI_IS_RESIZEFS, /* resizefs is in process */
|
2015-01-28 17:48:42 +08:00
|
|
|
};
|
|
|
|
|
2016-01-09 07:51:50 +08:00
|
|
|
enum {
|
|
|
|
CP_TIME,
|
2016-01-09 08:57:48 +08:00
|
|
|
REQ_TIME,
|
2018-09-19 16:48:47 +08:00
|
|
|
DISCARD_TIME,
|
|
|
|
GC_TIME,
|
2018-08-21 10:21:43 +08:00
|
|
|
DISABLE_TIME,
|
2019-01-15 02:42:11 +08:00
|
|
|
UMOUNT_DISCARD_TIMEOUT,
|
2016-01-09 07:51:50 +08:00
|
|
|
MAX_TIME,
|
|
|
|
};
|
|
|
|
|
2018-05-08 05:22:40 +08:00
|
|
|
enum {
|
|
|
|
GC_NORMAL,
|
|
|
|
GC_IDLE_CB,
|
|
|
|
GC_IDLE_GREEDY,
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
GC_IDLE_AT,
|
2020-07-02 12:14:14 +08:00
|
|
|
GC_URGENT_HIGH,
|
|
|
|
GC_URGENT_LOW,
|
2021-07-10 13:53:57 +08:00
|
|
|
MAX_GC_MODE,
|
2018-05-08 05:22:40 +08:00
|
|
|
};
|
|
|
|
|
2020-02-14 17:44:13 +08:00
|
|
|
enum {
|
|
|
|
BGGC_MODE_ON, /* background gc is on */
|
|
|
|
BGGC_MODE_OFF, /* background gc is off */
|
|
|
|
BGGC_MODE_SYNC, /*
|
|
|
|
* background gc is on, migrating blocks
|
|
|
|
* like foreground gc
|
|
|
|
*/
|
|
|
|
};
|
|
|
|
|
2020-02-14 17:44:12 +08:00
|
|
|
enum {
|
2021-09-30 02:12:03 +08:00
|
|
|
FS_MODE_ADAPTIVE, /* use both lfs/ssr allocation */
|
|
|
|
FS_MODE_LFS, /* use lfs allocation only */
|
|
|
|
FS_MODE_FRAGMENT_SEG, /* segment fragmentation mode */
|
|
|
|
FS_MODE_FRAGMENT_BLK, /* block fragmentation mode */
|
2020-02-14 17:44:12 +08:00
|
|
|
};
|
|
|
|
|
2018-01-31 10:36:57 +08:00
|
|
|
enum {
|
|
|
|
WHINT_MODE_OFF, /* not pass down write hints */
|
|
|
|
WHINT_MODE_USER, /* try to pass down hints given by users */
|
2018-01-31 10:36:58 +08:00
|
|
|
WHINT_MODE_FS, /* pass down hints with F2FS policy */
|
2018-01-31 10:36:57 +08:00
|
|
|
};
|
|
|
|
|
2018-02-19 00:50:49 +08:00
|
|
|
enum {
|
|
|
|
ALLOC_MODE_DEFAULT, /* stay default */
|
|
|
|
ALLOC_MODE_REUSE, /* reuse segments as much as possible */
|
|
|
|
};
|
|
|
|
|
2018-03-07 12:07:49 +08:00
|
|
|
enum fsync_mode {
|
|
|
|
FSYNC_MODE_POSIX, /* fsync follows posix semantics */
|
|
|
|
FSYNC_MODE_STRICT, /* fsync behaves in line with ext4 */
|
2018-05-26 09:02:58 +08:00
|
|
|
FSYNC_MODE_NOBARRIER, /* fsync behaves nobarrier based on posix */
|
2018-03-07 12:07:49 +08:00
|
|
|
};
|
|
|
|
|
2020-12-01 12:08:02 +08:00
|
|
|
enum {
|
|
|
|
COMPR_MODE_FS, /*
|
|
|
|
* automatically compress compression
|
|
|
|
* enabled files
|
|
|
|
*/
|
|
|
|
COMPR_MODE_USER, /*
|
|
|
|
* automatical compression is disabled.
|
|
|
|
* user can control the file compression
|
|
|
|
* using ioctls
|
|
|
|
*/
|
|
|
|
};
|
|
|
|
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
enum {
|
|
|
|
DISCARD_UNIT_BLOCK, /* basic discard unit is block */
|
|
|
|
DISCARD_UNIT_SEGMENT, /* basic discard unit is segment */
|
|
|
|
DISCARD_UNIT_SECTION, /* basic discard unit is section */
|
|
|
|
};
|
|
|
|
|
2021-04-28 17:20:31 +08:00
|
|
|
static inline int f2fs_test_bit(unsigned int nr, char *addr);
|
|
|
|
static inline void f2fs_set_bit(unsigned int nr, char *addr);
|
|
|
|
static inline void f2fs_clear_bit(unsigned int nr, char *addr);
|
|
|
|
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
/*
|
2021-04-28 17:20:31 +08:00
|
|
|
* Layout of f2fs page.private:
|
|
|
|
*
|
|
|
|
* Layout A: lowest bit should be 1
|
|
|
|
* | bit0 = 1 | bit1 | bit2 | ... | bit MAX | private data .... |
|
|
|
|
* bit 0 PAGE_PRIVATE_NOT_POINTER
|
|
|
|
* bit 1 PAGE_PRIVATE_ATOMIC_WRITE
|
|
|
|
* bit 2 PAGE_PRIVATE_DUMMY_WRITE
|
|
|
|
* bit 3 PAGE_PRIVATE_ONGOING_MIGRATION
|
|
|
|
* bit 4 PAGE_PRIVATE_INLINE_INODE
|
|
|
|
* bit 5 PAGE_PRIVATE_REF_RESOURCE
|
|
|
|
* bit 6- f2fs private data
|
|
|
|
*
|
|
|
|
* Layout B: lowest bit should be 0
|
|
|
|
* page.private is a wrapped pointer.
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
*/
|
2021-04-28 17:20:31 +08:00
|
|
|
enum {
|
|
|
|
PAGE_PRIVATE_NOT_POINTER, /* private contains non-pointer data */
|
|
|
|
PAGE_PRIVATE_ATOMIC_WRITE, /* data page from atomic write path */
|
|
|
|
PAGE_PRIVATE_DUMMY_WRITE, /* data page for padding aligned IO */
|
|
|
|
PAGE_PRIVATE_ONGOING_MIGRATION, /* data page which is on-going migrating */
|
|
|
|
PAGE_PRIVATE_INLINE_INODE, /* inode page contains inline data */
|
|
|
|
PAGE_PRIVATE_REF_RESOURCE, /* dirty page has referenced resources */
|
|
|
|
PAGE_PRIVATE_MAX
|
|
|
|
};
|
|
|
|
|
|
|
|
#define PAGE_PRIVATE_GET_FUNC(name, flagname) \
|
|
|
|
static inline bool page_private_##name(struct page *page) \
|
|
|
|
{ \
|
2021-07-05 13:11:25 +08:00
|
|
|
return PagePrivate(page) && \
|
|
|
|
test_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page)) && \
|
2021-04-28 17:20:31 +08:00
|
|
|
test_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
|
|
|
|
}
|
|
|
|
|
|
|
|
#define PAGE_PRIVATE_SET_FUNC(name, flagname) \
|
|
|
|
static inline void set_page_private_##name(struct page *page) \
|
|
|
|
{ \
|
|
|
|
if (!PagePrivate(page)) { \
|
|
|
|
get_page(page); \
|
|
|
|
SetPagePrivate(page); \
|
2021-07-05 13:11:25 +08:00
|
|
|
set_page_private(page, 0); \
|
2021-04-28 17:20:31 +08:00
|
|
|
} \
|
|
|
|
set_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page)); \
|
|
|
|
set_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
|
|
|
|
}
|
|
|
|
|
|
|
|
#define PAGE_PRIVATE_CLEAR_FUNC(name, flagname) \
|
|
|
|
static inline void clear_page_private_##name(struct page *page) \
|
|
|
|
{ \
|
|
|
|
clear_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
|
|
|
|
if (page_private(page) == 1 << PAGE_PRIVATE_NOT_POINTER) { \
|
|
|
|
set_page_private(page, 0); \
|
|
|
|
if (PagePrivate(page)) { \
|
|
|
|
ClearPagePrivate(page); \
|
|
|
|
put_page(page); \
|
|
|
|
}\
|
|
|
|
} \
|
|
|
|
}
|
|
|
|
|
|
|
|
PAGE_PRIVATE_GET_FUNC(nonpointer, NOT_POINTER);
|
|
|
|
PAGE_PRIVATE_GET_FUNC(reference, REF_RESOURCE);
|
|
|
|
PAGE_PRIVATE_GET_FUNC(inline, INLINE_INODE);
|
|
|
|
PAGE_PRIVATE_GET_FUNC(gcing, ONGOING_MIGRATION);
|
|
|
|
PAGE_PRIVATE_GET_FUNC(atomic, ATOMIC_WRITE);
|
|
|
|
PAGE_PRIVATE_GET_FUNC(dummy, DUMMY_WRITE);
|
|
|
|
|
|
|
|
PAGE_PRIVATE_SET_FUNC(reference, REF_RESOURCE);
|
|
|
|
PAGE_PRIVATE_SET_FUNC(inline, INLINE_INODE);
|
|
|
|
PAGE_PRIVATE_SET_FUNC(gcing, ONGOING_MIGRATION);
|
|
|
|
PAGE_PRIVATE_SET_FUNC(atomic, ATOMIC_WRITE);
|
|
|
|
PAGE_PRIVATE_SET_FUNC(dummy, DUMMY_WRITE);
|
|
|
|
|
|
|
|
PAGE_PRIVATE_CLEAR_FUNC(reference, REF_RESOURCE);
|
|
|
|
PAGE_PRIVATE_CLEAR_FUNC(inline, INLINE_INODE);
|
|
|
|
PAGE_PRIVATE_CLEAR_FUNC(gcing, ONGOING_MIGRATION);
|
|
|
|
PAGE_PRIVATE_CLEAR_FUNC(atomic, ATOMIC_WRITE);
|
|
|
|
PAGE_PRIVATE_CLEAR_FUNC(dummy, DUMMY_WRITE);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
|
2021-05-20 19:51:50 +08:00
|
|
|
static inline unsigned long get_page_private_data(struct page *page)
|
|
|
|
{
|
|
|
|
unsigned long data = page_private(page);
|
|
|
|
|
|
|
|
if (!test_bit(PAGE_PRIVATE_NOT_POINTER, &data))
|
|
|
|
return 0;
|
|
|
|
return data >> PAGE_PRIVATE_MAX;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void set_page_private_data(struct page *page, unsigned long data)
|
|
|
|
{
|
|
|
|
if (!PagePrivate(page)) {
|
|
|
|
get_page(page);
|
|
|
|
SetPagePrivate(page);
|
2021-07-05 13:11:25 +08:00
|
|
|
set_page_private(page, 0);
|
2021-05-20 19:51:50 +08:00
|
|
|
}
|
|
|
|
set_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page));
|
|
|
|
page_private(page) |= data << PAGE_PRIVATE_MAX;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void clear_page_private_data(struct page *page)
|
|
|
|
{
|
|
|
|
page_private(page) &= (1 << PAGE_PRIVATE_MAX) - 1;
|
|
|
|
if (page_private(page) == 1 << PAGE_PRIVATE_NOT_POINTER) {
|
|
|
|
set_page_private(page, 0);
|
|
|
|
if (PagePrivate(page)) {
|
|
|
|
ClearPagePrivate(page);
|
|
|
|
put_page(page);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
/* For compression */
|
|
|
|
enum compress_algorithm_type {
|
|
|
|
COMPRESS_LZO,
|
|
|
|
COMPRESS_LZ4,
|
2020-03-03 17:46:02 +08:00
|
|
|
COMPRESS_ZSTD,
|
2020-04-08 19:56:32 +08:00
|
|
|
COMPRESS_LZORLE,
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
COMPRESS_MAX,
|
|
|
|
};
|
|
|
|
|
2020-11-26 18:32:09 +08:00
|
|
|
enum compress_flag {
|
|
|
|
COMPRESS_CHKSUM,
|
|
|
|
COMPRESS_MAX_FLAG,
|
|
|
|
};
|
|
|
|
|
2021-05-20 19:51:50 +08:00
|
|
|
#define COMPRESS_WATERMARK 20
|
|
|
|
#define COMPRESS_PERCENT 20
|
|
|
|
|
2020-11-26 18:32:09 +08:00
|
|
|
#define COMPRESS_DATA_RESERVED_SIZE 4
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
struct compress_data {
|
|
|
|
__le32 clen; /* compressed data size */
|
2020-11-26 18:32:09 +08:00
|
|
|
__le32 chksum; /* compressed data chksum */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
__le32 reserved[COMPRESS_DATA_RESERVED_SIZE]; /* reserved */
|
|
|
|
u8 cdata[]; /* compressed data */
|
|
|
|
};
|
|
|
|
|
|
|
|
#define COMPRESS_HEADER_SIZE (sizeof(struct compress_data))
|
|
|
|
|
|
|
|
#define F2FS_COMPRESSED_PAGE_MAGIC 0xF5F2C000
|
|
|
|
|
2021-01-22 17:46:43 +08:00
|
|
|
#define COMPRESS_LEVEL_OFFSET 8
|
|
|
|
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
/* compress context */
|
|
|
|
struct compress_ctx {
|
|
|
|
struct inode *inode; /* inode the context belong to */
|
|
|
|
pgoff_t cluster_idx; /* cluster index number */
|
|
|
|
unsigned int cluster_size; /* page count in cluster */
|
|
|
|
unsigned int log_cluster_size; /* log of cluster size */
|
|
|
|
struct page **rpages; /* pages store raw data in cluster */
|
|
|
|
unsigned int nr_rpages; /* total page number in rpages */
|
|
|
|
struct page **cpages; /* pages store compressed data in cluster */
|
|
|
|
unsigned int nr_cpages; /* total page number in cpages */
|
2021-11-10 10:37:13 +08:00
|
|
|
unsigned int valid_nr_cpages; /* valid page number in cpages */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
void *rbuf; /* virtual mapped address on rpages */
|
|
|
|
struct compress_data *cbuf; /* virtual mapped address on cpages */
|
|
|
|
size_t rlen; /* valid data length in rbuf */
|
|
|
|
size_t clen; /* valid data length in cbuf */
|
|
|
|
void *private; /* payload buffer for specified compression algorithm */
|
2020-03-03 17:46:02 +08:00
|
|
|
void *private2; /* extra payload buffer */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
/* compress context for write IO path */
|
|
|
|
struct compress_io_ctx {
|
|
|
|
u32 magic; /* magic number to indicate page is compressed */
|
|
|
|
struct inode *inode; /* inode the context belong to */
|
|
|
|
struct page **rpages; /* pages store raw data in cluster */
|
|
|
|
unsigned int nr_rpages; /* total page number in rpages */
|
2020-08-10 18:39:30 +08:00
|
|
|
atomic_t pending_pages; /* in-flight compressed page count */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
};
|
|
|
|
|
2021-01-05 14:33:02 +08:00
|
|
|
/* Context for decompressing one cluster on the read IO path */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
struct decompress_io_ctx {
|
|
|
|
u32 magic; /* magic number to indicate page is compressed */
|
|
|
|
struct inode *inode; /* inode the context belong to */
|
|
|
|
pgoff_t cluster_idx; /* cluster index number */
|
|
|
|
unsigned int cluster_size; /* page count in cluster */
|
|
|
|
unsigned int log_cluster_size; /* log of cluster size */
|
|
|
|
struct page **rpages; /* pages store raw data in cluster */
|
|
|
|
unsigned int nr_rpages; /* total page number in rpages */
|
|
|
|
struct page **cpages; /* pages store compressed data in cluster */
|
|
|
|
unsigned int nr_cpages; /* total page number in cpages */
|
|
|
|
struct page **tpages; /* temp pages to pad holes in cluster */
|
|
|
|
void *rbuf; /* virtual mapped address on rpages */
|
|
|
|
struct compress_data *cbuf; /* virtual mapped address on cpages */
|
|
|
|
size_t rlen; /* valid data length in rbuf */
|
|
|
|
size_t clen; /* valid data length in cbuf */
|
2021-01-05 14:33:02 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The number of compressed pages remaining to be read in this cluster.
|
|
|
|
* This is initially nr_cpages. It is decremented by 1 each time a page
|
|
|
|
* has been read (or failed to be read). When it reaches 0, the cluster
|
|
|
|
* is decompressed (or an error is reported).
|
|
|
|
*
|
|
|
|
* If an error occurs before all the pages have been submitted for I/O,
|
|
|
|
* then this will never reach 0. In this case the I/O submitter is
|
|
|
|
* responsible for calling f2fs_decompress_end_io() instead.
|
|
|
|
*/
|
|
|
|
atomic_t remaining_pages;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Number of references to this decompress_io_ctx.
|
|
|
|
*
|
|
|
|
* One reference is held for I/O completion. This reference is dropped
|
|
|
|
* after the pagecache pages are updated and unlocked -- either after
|
|
|
|
* decompression (and verity if enabled), or after an error.
|
|
|
|
*
|
|
|
|
* In addition, each compressed page holds a reference while it is in a
|
|
|
|
* bio. These references are necessary prevent compressed pages from
|
|
|
|
* being freed while they are still in a bio.
|
|
|
|
*/
|
|
|
|
refcount_t refcnt;
|
|
|
|
|
|
|
|
bool failed; /* IO error occurred before decompression? */
|
|
|
|
bool need_verity; /* need fs-verity verification after decompression? */
|
2020-03-03 17:46:02 +08:00
|
|
|
void *private; /* payload buffer for specified decompression algorithm */
|
|
|
|
void *private2; /* extra payload buffer */
|
2021-01-05 14:33:02 +08:00
|
|
|
struct work_struct verity_work; /* work to verify the decompressed pages */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
#define NULL_CLUSTER ((unsigned int)(~0))
|
|
|
|
#define MIN_COMPRESS_LOG_SIZE 2
|
|
|
|
#define MAX_COMPRESS_LOG_SIZE 8
|
2020-09-02 15:01:52 +08:00
|
|
|
#define MAX_COMPRESS_WINDOW_SIZE(log_size) ((PAGE_SIZE) << (log_size))
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct f2fs_sb_info {
|
|
|
|
struct super_block *sb; /* pointer to VFS super block */
|
2013-06-28 11:47:01 +08:00
|
|
|
struct proc_dir_entry *s_proc; /* proc entry */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct f2fs_super_block *raw_super; /* raw super block pointer */
|
2018-02-26 22:04:13 +08:00
|
|
|
struct rw_semaphore sb_lock; /* lock for raw super block */
|
2015-12-15 17:19:26 +08:00
|
|
|
int valid_super_block; /* valid super block no */
|
2016-09-20 10:29:47 +08:00
|
|
|
unsigned long s_flag; /* flags for sbi */
|
2018-08-10 08:53:34 +08:00
|
|
|
struct mutex writepages; /* mutex for writepages() */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2016-10-28 16:45:05 +08:00
|
|
|
#ifdef CONFIG_BLK_DEV_ZONED
|
|
|
|
unsigned int blocks_per_blkz; /* F2FS blocks per zone */
|
|
|
|
unsigned int log_blocks_per_blkz; /* log2 F2FS blocks per zone */
|
|
|
|
#endif
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
/* for node-related operations */
|
|
|
|
struct f2fs_nm_info *nm_info; /* node manager */
|
|
|
|
struct inode *node_inode; /* cache node blocks */
|
|
|
|
|
|
|
|
/* for segment-related operations */
|
|
|
|
struct f2fs_sm_info *sm_info; /* segment manager */
|
2013-11-19 11:47:22 +08:00
|
|
|
|
|
|
|
/* for bio operations */
|
2017-05-11 02:18:25 +08:00
|
|
|
struct f2fs_bio_info *write_io[NR_PAGE_TYPE]; /* for write bios */
|
2018-05-26 09:00:13 +08:00
|
|
|
/* keep migration IO order for LFS mode */
|
|
|
|
struct rw_semaphore io_order_lock;
|
2016-12-15 02:12:56 +08:00
|
|
|
mempool_t *write_io_dummy; /* Dummy pages */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
/* for checkpoint */
|
|
|
|
struct f2fs_checkpoint *ckpt; /* raw checkpoint pointer */
|
2016-11-25 04:45:15 +08:00
|
|
|
int cur_cp_pack; /* remain current cp pack */
|
2016-09-20 11:04:18 +08:00
|
|
|
spinlock_t cp_lock; /* for flag in ckpt */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct inode *meta_inode; /* cache meta blocks */
|
2020-11-23 13:28:32 +08:00
|
|
|
struct rw_semaphore cp_global_sem; /* checkpoint procedure lock */
|
2016-08-05 02:38:25 +08:00
|
|
|
struct rw_semaphore cp_rwsem; /* blocking FS operations */
|
2014-07-03 18:58:39 +08:00
|
|
|
struct rw_semaphore node_write; /* locking node writes */
|
2017-03-13 20:22:18 +08:00
|
|
|
struct rw_semaphore node_change; /* locking node change */
|
2013-11-07 11:48:25 +08:00
|
|
|
wait_queue_head_t cp_wait;
|
2016-01-09 07:51:50 +08:00
|
|
|
unsigned long last_time[MAX_TIME]; /* to store time in jiffies */
|
|
|
|
long interval_time[MAX_TIME]; /* to store thresholds */
|
f2fs: introduce checkpoint_merge mount option
We've added a new mount options, "checkpoint_merge" and "nocheckpoint_merge",
which creates a kernel daemon and makes it to merge concurrent checkpoint
requests as much as possible to eliminate redundant checkpoint issues. Plus,
we can eliminate the sluggish issue caused by slow checkpoint operation
when the checkpoint is done in a process context in a cgroup having
low i/o budget and cpu shares. To make this do better, we set the
default i/o priority of the kernel daemon to "3", to give one higher
priority than other kernel threads. The below verification result
explains this.
The basic idea has come from https://opensource.samsung.com.
[Verification]
Android Pixel Device(ARM64, 7GB RAM, 256GB UFS)
Create two I/O cgroups (fg w/ weight 100, bg w/ wight 20)
Set "strict_guarantees" to "1" in BFQ tunables
In "fg" cgroup,
- thread A => trigger 1000 checkpoint operations
"for i in `seq 1 1000`; do touch test_dir1/file; fsync test_dir1;
done"
- thread B => gererating async. I/O
"fio --rw=write --numjobs=1 --bs=128k --runtime=3600 --time_based=1
--filename=test_img --name=test"
In "bg" cgroup,
- thread C => trigger repeated checkpoint operations
"echo $$ > /dev/blkio/bg/tasks; while true; do touch test_dir2/file;
fsync test_dir2; done"
We've measured thread A's execution time.
[ w/o patch ]
Elapsed Time: Avg. 68 seconds
[ w/ patch ]
Elapsed Time: Avg. 48 seconds
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
[Jaegeuk Kim: fix the return value in f2fs_start_ckpt_thread, reported by Dan]
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-19 08:00:42 +08:00
|
|
|
struct ckpt_req_control cprc_info; /* for checkpoint request control */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2020-07-24 16:55:28 +08:00
|
|
|
struct inode_management im[MAX_INO_ENTRY]; /* manage inode cache */
|
2014-07-26 06:47:17 +08:00
|
|
|
|
f2fs: fix to avoid broken of dnode block list
f2fs recovery flow is relying on dnode block link list, it means fsynced
file recovery depends on previous dnode's persistence in the list, so
during fsync() we should wait on all regular inode's dnode writebacked
before issuing flush.
By this way, we can avoid dnode block list being broken by out-of-order
IO submission due to IO scheduler or driver.
Sheng Yong helps to do the test with this patch:
Target:/data (f2fs, -)
64MB / 32768KB / 4KB / 8
1 / PERSIST / Index
Base:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 867.82 204.15 41440.03 41370.54 680.8 1025.94 1031.08
2 871.87 205.87 41370.3 40275.2 791.14 1065.84 1101.7
3 866.52 205.69 41795.67 40596.16 694.69 1037.16 1031.48
Avg 868.7366667 205.2366667 41535.33333 40747.3 722.21 1042.98 1054.753333
After:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 798.81 202.5 41143 40613.87 602.71 838.08 913.83
2 805.79 206.47 40297.2 41291.46 604.44 840.75 924.27
3 814.83 206.17 41209.57 40453.62 602.85 834.66 927.91
Avg 806.4766667 205.0466667 40883.25667 40786.31667 603.3333333 837.83 922.0033333
Patched/Original:
0.928332713 0.999074239 0.984300676 1.000957528 0.835398753 0.803303994 0.874141189
It looks like atomic write will suffer performance regression.
I suspect that the criminal is that we forcing to wait all dnode being in
storage cache before we issue PREFLUSH+FUA.
BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
cause the problem: we will lose data of last transaction after SPO, even if
atomic write return no error:
- atomic_open();
- write() P1, P2, P3;
- atomic_commit();
- writeback data: P1, P2, P3;
- writeback node: N1, N2, N3; <--- If N1, N2 is not writebacked, N3 with fsync_mark is
writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
last transaction.
- preflush + fua;
- power-cut
If we don't wait dnode writeback for atomic_write:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 779.91 206.03 41621.5 40333.16 716.9 1038.21 1034.85
2 848.51 204.35 40082.44 39486.17 791.83 1119.96 1083.77
3 772.12 206.27 41335.25 41599.65 723.29 1055.07 971.92
Avg 800.18 205.55 41013.06333 40472.99333 744.0066667 1071.08 1030.18
Patched/Original:
0.92108464 1.001526693 0.987425886 0.993268102 1.030180511 1.026942031 0.976702294
SQLite's performance recovers.
Jaegeuk:
"Practically, I don't see db corruption becase of this. We can excuse to lose
the last transaction."
Finally, we decide to keep original implementation of atomic write interface
sematics that we don't wait all dnode writeback before preflush+fua submission.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-02 23:03:19 +08:00
|
|
|
spinlock_t fsync_node_lock; /* for node entry lock */
|
|
|
|
struct list_head fsync_node_list; /* node list head */
|
|
|
|
unsigned int fsync_seg_id; /* sequence id */
|
|
|
|
unsigned int fsync_node_num; /* number of node entries */
|
|
|
|
|
2014-07-26 06:47:17 +08:00
|
|
|
/* for orphan inode, use 0'th array */
|
2013-12-26 18:24:19 +08:00
|
|
|
unsigned int max_orphans; /* max orphan inodes */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2015-12-16 13:09:20 +08:00
|
|
|
/* for inode management */
|
|
|
|
struct list_head inode_list[NR_INODE_TYPE]; /* dirty inode list */
|
|
|
|
spinlock_t inode_lock[NR_INODE_TYPE]; /* for dirty inode list lock */
|
2019-05-20 17:36:59 +08:00
|
|
|
struct mutex flush_lock; /* for flush exclusion */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2015-02-05 17:52:58 +08:00
|
|
|
/* for extent tree cache */
|
|
|
|
struct radix_tree_root extent_tree_root;/* cache extent cache entries */
|
2017-02-23 19:39:59 +08:00
|
|
|
struct mutex extent_tree_lock; /* locking extent radix tree */
|
2015-02-05 17:52:58 +08:00
|
|
|
struct list_head extent_list; /* lru list for shrinker */
|
|
|
|
spinlock_t extent_lock; /* locking extent lru list */
|
2015-12-22 11:20:15 +08:00
|
|
|
atomic_t total_ext_tree; /* extent tree count */
|
2016-01-01 07:02:16 +08:00
|
|
|
struct list_head zombie_list; /* extent zombie tree list */
|
2015-12-22 11:25:50 +08:00
|
|
|
atomic_t total_zombie_tree; /* extent zombie tree count */
|
2015-02-05 17:52:58 +08:00
|
|
|
atomic_t total_ext_node; /* extent info count */
|
|
|
|
|
2014-08-06 22:22:50 +08:00
|
|
|
/* basic filesystem units */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
unsigned int log_sectors_per_block; /* log2 sectors per block */
|
|
|
|
unsigned int log_blocksize; /* log2 block size */
|
|
|
|
unsigned int blocksize; /* block size */
|
|
|
|
unsigned int root_ino_num; /* root inode number*/
|
|
|
|
unsigned int node_ino_num; /* node inode number*/
|
|
|
|
unsigned int meta_ino_num; /* meta inode number*/
|
|
|
|
unsigned int log_blocks_per_seg; /* log2 blocks per segment */
|
|
|
|
unsigned int blocks_per_seg; /* blocks per segment */
|
|
|
|
unsigned int segs_per_sec; /* segments per section */
|
|
|
|
unsigned int secs_per_zone; /* sections per zone */
|
|
|
|
unsigned int total_sections; /* total section count */
|
|
|
|
unsigned int total_node_count; /* total node block count */
|
|
|
|
unsigned int total_valid_node_count; /* valid node block count */
|
2014-02-27 19:09:05 +08:00
|
|
|
int dir_level; /* directory level */
|
2017-11-22 18:23:38 +08:00
|
|
|
int readdir_ra; /* readahead inode in readdir */
|
2020-12-04 01:52:45 +08:00
|
|
|
u64 max_io_bytes; /* max io bytes to merge IOs */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
block_t user_block_count; /* # of user blocks */
|
|
|
|
block_t total_valid_block_count; /* # of valid blocks */
|
2015-05-01 13:37:50 +08:00
|
|
|
block_t discard_blks; /* discard command candidats */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
block_t last_valid_block_count; /* for recovery */
|
2017-06-26 16:24:41 +08:00
|
|
|
block_t reserved_blocks; /* configurable reserved blocks */
|
2017-10-27 20:45:05 +08:00
|
|
|
block_t current_reserved_blocks; /* current reserved blocks */
|
2017-06-26 16:24:41 +08:00
|
|
|
|
2018-08-21 10:21:43 +08:00
|
|
|
/* Additional tracking for no checkpoint mode */
|
|
|
|
block_t unusable_block_count; /* # of blocks saved by last cp */
|
|
|
|
|
2017-11-16 16:59:14 +08:00
|
|
|
unsigned int nquota_files; /* # of quota sysfile */
|
2019-05-30 01:58:45 +08:00
|
|
|
struct rw_semaphore quota_sem; /* blocking cp for flags */
|
2017-11-16 16:59:14 +08:00
|
|
|
|
2016-05-14 03:36:58 +08:00
|
|
|
/* # of pages, see count_type */
|
2016-10-21 10:09:57 +08:00
|
|
|
atomic_t nr_pages[NR_COUNT_TYPE];
|
2016-05-17 02:06:50 +08:00
|
|
|
/* # of allocated blocks */
|
|
|
|
struct percpu_counter alloc_valid_block_count;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2017-03-29 09:07:38 +08:00
|
|
|
/* writeback control */
|
2018-06-04 23:20:36 +08:00
|
|
|
atomic_t wb_sync_req[META]; /* count # of WB_SYNC threads */
|
2017-03-29 09:07:38 +08:00
|
|
|
|
2016-05-17 02:42:32 +08:00
|
|
|
/* valid inode count */
|
|
|
|
struct percpu_counter total_valid_inode_count;
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct f2fs_mount_info mount_opt; /* mount options */
|
|
|
|
|
|
|
|
/* for cleaning operations */
|
2020-01-14 19:36:50 +08:00
|
|
|
struct rw_semaphore gc_lock; /*
|
|
|
|
* semaphore for GC, avoid
|
|
|
|
* race between GC and GC or CP
|
|
|
|
*/
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct f2fs_gc_kthread *gc_thread; /* GC thread */
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
struct atgc_management am; /* atgc management */
|
2013-03-31 12:26:03 +08:00
|
|
|
unsigned int cur_victim_sec; /* current victim section num */
|
2018-05-08 05:22:40 +08:00
|
|
|
unsigned int gc_mode; /* current GC state */
|
f2fs: support subsectional garbage collection
Section is minimal garbage collection unit of f2fs, in zoned block
device, or ancient block mapping flash device, in order to improve
GC efficiency, we can align GC unit to lower device erase unit,
normally, it consists of multiple of segments.
Once background or foreground GC triggers, it brings a large number
of IOs, which will impact user IO, and also occupy cpu/memory resource
intensively.
So, to reduce impact of GC on large size section, this patch supports
subsectional GC, in one cycle of GC, it only migrate partial segment{s}
in victim section. Currently, by default, we use sbi->segs_per_sec as
migration granularity.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-24 18:37:27 +08:00
|
|
|
unsigned int next_victim_seg[2]; /* next segment in victim section */
|
2021-12-09 08:41:51 +08:00
|
|
|
spinlock_t gc_urgent_high_lock;
|
|
|
|
bool gc_urgent_high_limited; /* indicates having limited trial count */
|
|
|
|
unsigned int gc_urgent_high_remaining; /* remaining trial count for GC_URGENT_HIGH */
|
2020-07-02 12:14:14 +08:00
|
|
|
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
/* for skip statistic */
|
2020-07-24 16:55:28 +08:00
|
|
|
unsigned int atomic_files; /* # of opened atomic file */
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
unsigned long long skipped_atomic_files[2]; /* FG_GC and BG_GC */
|
2018-07-25 11:11:56 +08:00
|
|
|
unsigned long long skipped_gc_rwsem; /* FG_GC only */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2017-12-08 08:25:39 +08:00
|
|
|
/* threshold for gc trials on pinned files */
|
|
|
|
u64 gc_pin_file_threshold;
|
2019-10-19 01:06:40 +08:00
|
|
|
struct rw_semaphore pin_sem;
|
2017-12-08 08:25:39 +08:00
|
|
|
|
2014-01-08 12:45:08 +08:00
|
|
|
/* maximum # of trials to find a victim segment for SSR and GC */
|
|
|
|
unsigned int max_victim_search;
|
f2fs: support subsectional garbage collection
Section is minimal garbage collection unit of f2fs, in zoned block
device, or ancient block mapping flash device, in order to improve
GC efficiency, we can align GC unit to lower device erase unit,
normally, it consists of multiple of segments.
Once background or foreground GC triggers, it brings a large number
of IOs, which will impact user IO, and also occupy cpu/memory resource
intensively.
So, to reduce impact of GC on large size section, this patch supports
subsectional GC, in one cycle of GC, it only migrate partial segment{s}
in victim section. Currently, by default, we use sbi->segs_per_sec as
migration granularity.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-24 18:37:27 +08:00
|
|
|
/* migration granularity of garbage collection, unit: segment */
|
|
|
|
unsigned int migration_granularity;
|
2014-01-08 12:45:08 +08:00
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
/*
|
|
|
|
* for stat information.
|
|
|
|
* one is for the LFS mode, and the other is for the SSR mode.
|
|
|
|
*/
|
2013-05-23 21:57:53 +08:00
|
|
|
#ifdef CONFIG_F2FS_STAT_FS
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct f2fs_stat_info *stat_info; /* FS status information */
|
2018-09-29 18:31:27 +08:00
|
|
|
atomic_t meta_count[META_MAX]; /* # of meta blocks */
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
unsigned int segment_count[2]; /* # of allocated segments */
|
|
|
|
unsigned int block_count[2]; /* # of allocated blocks */
|
2014-12-24 01:16:54 +08:00
|
|
|
atomic_t inplace_count; /* # of inplace update */
|
2015-09-30 17:38:48 +08:00
|
|
|
atomic64_t total_hit_ext; /* # of lookup extent cache */
|
|
|
|
atomic64_t read_hit_rbtree; /* # of hit rbtree extent node */
|
|
|
|
atomic64_t read_hit_largest; /* # of hit largest extent node */
|
|
|
|
atomic64_t read_hit_cached; /* # of hit cached extent node */
|
2015-07-15 17:28:53 +08:00
|
|
|
atomic_t inline_xattr; /* # of inline_xattr inodes */
|
2014-12-08 19:08:20 +08:00
|
|
|
atomic_t inline_inode; /* # of inline_data inodes */
|
|
|
|
atomic_t inline_dir; /* # of inline_dentry inodes */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
atomic_t compr_inode; /* # of compressed inodes */
|
2020-08-31 10:09:49 +08:00
|
|
|
atomic64_t compr_blocks; /* # of compressed blocks */
|
2017-03-22 17:23:45 +08:00
|
|
|
atomic_t vw_cnt; /* # of volatile writes */
|
2016-12-29 05:55:09 +08:00
|
|
|
atomic_t max_aw_cnt; /* max # of atomic writes */
|
2017-03-22 17:23:45 +08:00
|
|
|
atomic_t max_vw_cnt; /* max # of volatile writes */
|
2018-09-29 18:31:28 +08:00
|
|
|
unsigned int io_skip_bggc; /* skip background gc for in-flight IO */
|
|
|
|
unsigned int other_skip_bggc; /* skip background gc for other reasons */
|
2015-12-17 17:14:44 +08:00
|
|
|
unsigned int ndirty_inode[NR_INODE_TYPE]; /* # of dirty inodes */
|
2013-05-23 21:57:53 +08:00
|
|
|
#endif
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
spinlock_t stat_lock; /* lock for stat operations */
|
2013-08-04 22:09:40 +08:00
|
|
|
|
2020-04-03 00:32:35 +08:00
|
|
|
/* to attach REQ_META|REQ_FUA flags */
|
|
|
|
unsigned int data_io_flag;
|
2020-06-05 02:49:43 +08:00
|
|
|
unsigned int node_io_flag;
|
2017-08-02 23:21:48 +08:00
|
|
|
|
2013-08-04 22:09:40 +08:00
|
|
|
/* For sysfs suppport */
|
2020-12-09 16:43:27 +08:00
|
|
|
struct kobject s_kobj; /* /sys/fs/f2fs/<devname> */
|
2013-08-04 22:09:40 +08:00
|
|
|
struct completion s_kobj_unregister;
|
2015-06-20 03:01:21 +08:00
|
|
|
|
2020-12-09 16:43:27 +08:00
|
|
|
struct kobject s_stat_kobj; /* /sys/fs/f2fs/<devname>/stat */
|
|
|
|
struct completion s_stat_kobj_unregister;
|
|
|
|
|
2021-06-04 03:31:08 +08:00
|
|
|
struct kobject s_feature_list_kobj; /* /sys/fs/f2fs/<devname>/feature_list */
|
|
|
|
struct completion s_feature_list_kobj_unregister;
|
|
|
|
|
2015-06-20 03:01:21 +08:00
|
|
|
/* For shrinker support */
|
|
|
|
struct list_head s_list;
|
2021-09-01 14:39:20 +08:00
|
|
|
struct mutex umount_mutex;
|
|
|
|
unsigned int shrinker_run_no;
|
|
|
|
|
|
|
|
/* For multi devices */
|
2016-10-07 10:02:05 +08:00
|
|
|
int s_ndevs; /* number of devices */
|
|
|
|
struct f2fs_dev_info *devs; /* for device list */
|
2017-09-29 13:59:39 +08:00
|
|
|
unsigned int dirty_device; /* for checkpoint data flush */
|
|
|
|
spinlock_t dev_lock; /* protect dirty_device */
|
2021-09-01 14:39:20 +08:00
|
|
|
bool aligned_blksize; /* all devices has the same logical blksize */
|
2016-01-27 09:57:30 +08:00
|
|
|
|
|
|
|
/* For write statistics */
|
|
|
|
u64 sectors_written_start;
|
|
|
|
u64 kbytes_written;
|
2016-03-03 04:04:24 +08:00
|
|
|
|
|
|
|
/* Reference to checksum algorithm driver via cryptoapi */
|
|
|
|
struct crypto_shash *s_chksum_driver;
|
2016-09-23 21:30:09 +08:00
|
|
|
|
2017-07-31 20:19:09 +08:00
|
|
|
/* Precomputed FS UUID checksum for seeding other checksums */
|
|
|
|
__u32 s_chksum_seed;
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
|
|
|
|
struct workqueue_struct *post_read_wq; /* post read workqueue */
|
2020-02-25 18:17:10 +08:00
|
|
|
|
|
|
|
struct kmem_cache *inline_xattr_slab; /* inline xattr entry */
|
|
|
|
unsigned int inline_xattr_slab_size; /* default inline xattr slab size */
|
2020-09-14 17:05:13 +08:00
|
|
|
|
2021-07-10 13:53:57 +08:00
|
|
|
/* For reclaimed segs statistics per each GC mode */
|
|
|
|
unsigned int gc_segment_mode; /* GC state for reclaimed segments */
|
|
|
|
unsigned int gc_reclaimed_segs[MAX_GC_MODE]; /* Reclaimed segs for each mode */
|
|
|
|
|
2021-08-03 12:22:45 +08:00
|
|
|
unsigned long seq_file_ra_mul; /* multiplier for ra_pages of seq. files in fadvise */
|
|
|
|
|
2021-09-30 02:12:03 +08:00
|
|
|
int max_fragment_chunk; /* max chunk size for block fragmentation mode */
|
|
|
|
int max_fragment_hole; /* max hole size for block fragmentation mode */
|
|
|
|
|
2020-09-14 17:05:13 +08:00
|
|
|
#ifdef CONFIG_F2FS_FS_COMPRESSION
|
|
|
|
struct kmem_cache *page_array_slab; /* page array entry */
|
|
|
|
unsigned int page_array_slab_size; /* default page array slab size */
|
2021-03-15 16:12:33 +08:00
|
|
|
|
|
|
|
/* For runtime compression statistics */
|
|
|
|
u64 compr_written_block;
|
|
|
|
u64 compr_saved_block;
|
|
|
|
u32 compr_new_inode;
|
2021-05-20 19:51:50 +08:00
|
|
|
|
|
|
|
/* For compressed block cache */
|
|
|
|
struct inode *compress_inode; /* cache compressed blocks */
|
|
|
|
unsigned int compress_percent; /* cache page percentage */
|
|
|
|
unsigned int compress_watermark; /* cache page watermark */
|
|
|
|
atomic_t compress_page_hit; /* cache hit count */
|
2020-09-14 17:05:13 +08:00
|
|
|
#endif
|
2021-08-20 11:52:28 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_F2FS_IOSTAT
|
|
|
|
/* For app/fs IO statistics */
|
|
|
|
spinlock_t iostat_lock;
|
|
|
|
unsigned long long rw_iostat[NR_IO_TYPE];
|
|
|
|
unsigned long long prev_rw_iostat[NR_IO_TYPE];
|
|
|
|
bool iostat_enable;
|
|
|
|
unsigned long iostat_next_period;
|
|
|
|
unsigned int iostat_period_ms;
|
f2fs: introduce periodic iostat io latency traces
Whenever we notice some sluggish issues on our machines, we are always
curious about how well all types of I/O in the f2fs filesystem are
handled. But, it's hard to get this kind of real data. First of all,
we need to reproduce the issue while turning on the profiling tool like
blktrace, but the issue doesn't happen again easily. Second, with the
intervention of any tools, the overall timing of the issue will be
slightly changed and it sometimes makes us hard to figure it out.
So, I added the feature printing out IO latency statistics tracepoint
events, which are minimal things to understand filesystem's I/O related
behaviors, into F2FS_IOSTAT kernel config. With "iostat_enable" sysfs
node on, we can get this statistics info in a periodic way and it
would cause the least overhead.
[samples]
f2fs_ckpt-254:1-507 [003] .... 2842.439683: f2fs_iostat_latency:
dev = (254,11), iotype [peak lat.(ms)/avg lat.(ms)/count],
rd_data [136/1/801], rd_node [136/1/1704], rd_meta [4/2/4],
wr_sync_data [164/16/3331], wr_sync_node [152/3/648],
wr_sync_meta [160/2/4243], wr_async_data [24/13/15],
wr_async_node [0/0/0], wr_async_meta [0/0/0]
f2fs_ckpt-254:1-507 [002] .... 2845.450514: f2fs_iostat_latency:
dev = (254,11), iotype [peak lat.(ms)/avg lat.(ms)/count],
rd_data [60/3/456], rd_node [60/3/1258], rd_meta [0/0/1],
wr_sync_data [120/12/2285], wr_sync_node [88/5/428],
wr_sync_meta [52/6/2990], wr_async_data [4/1/3],
wr_async_node [0/0/0], wr_async_meta [0/0/0]
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-21 06:29:09 +08:00
|
|
|
|
|
|
|
/* For io latency related statistics info in one iostat period */
|
|
|
|
spinlock_t iostat_lat_lock;
|
|
|
|
struct iostat_lat_info *iostat_io_lat;
|
2021-08-20 11:52:28 +08:00
|
|
|
#endif
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
};
|
|
|
|
|
2016-09-23 21:30:09 +08:00
|
|
|
#ifdef CONFIG_F2FS_FAULT_INJECTION
|
2019-11-01 17:53:23 +08:00
|
|
|
#define f2fs_show_injection_info(sbi, type) \
|
|
|
|
printk_ratelimited("%sF2FS-fs (%s) : inject %s in %s of %pS\n", \
|
|
|
|
KERN_INFO, sbi->sb->s_id, \
|
|
|
|
f2fs_fault_name[type], \
|
2017-02-25 11:08:28 +08:00
|
|
|
__func__, __builtin_return_address(0))
|
2016-09-23 21:30:09 +08:00
|
|
|
static inline bool time_to_inject(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
2018-03-08 14:22:56 +08:00
|
|
|
struct f2fs_fault_info *ffi = &F2FS_OPTION(sbi).fault_info;
|
2016-09-23 21:30:09 +08:00
|
|
|
|
|
|
|
if (!ffi->inject_rate)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
if (!IS_FAULT_SET(ffi, type))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
atomic_inc(&ffi->inject_ops);
|
|
|
|
if (atomic_read(&ffi->inject_ops) >= ffi->inject_rate) {
|
|
|
|
atomic_set(&ffi->inject_ops, 0);
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
2018-08-14 05:38:06 +08:00
|
|
|
#else
|
2019-11-01 17:53:23 +08:00
|
|
|
#define f2fs_show_injection_info(sbi, type) do { } while (0)
|
2018-08-14 05:38:06 +08:00
|
|
|
static inline bool time_to_inject(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
2016-09-23 21:30:09 +08:00
|
|
|
#endif
|
|
|
|
|
2019-03-16 08:13:06 +08:00
|
|
|
/*
|
|
|
|
* Test if the mounted volume is a multi-device volume.
|
|
|
|
* - For a single regular disk volume, sbi->s_ndevs is 0.
|
|
|
|
* - For a single zoned disk volume, sbi->s_ndevs is 1.
|
|
|
|
* - For a multi-device volume, sbi->s_ndevs is always 2 or more.
|
|
|
|
*/
|
|
|
|
static inline bool f2fs_is_multi_device(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return sbi->s_ndevs > 1;
|
|
|
|
}
|
|
|
|
|
2016-01-09 07:51:50 +08:00
|
|
|
static inline void f2fs_update_time(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
2018-09-19 16:48:47 +08:00
|
|
|
unsigned long now = jiffies;
|
|
|
|
|
|
|
|
sbi->last_time[type] = now;
|
|
|
|
|
|
|
|
/* DISCARD_TIME and GC_TIME are based on REQ_TIME */
|
|
|
|
if (type == REQ_TIME) {
|
|
|
|
sbi->last_time[DISCARD_TIME] = now;
|
|
|
|
sbi->last_time[GC_TIME] = now;
|
|
|
|
}
|
2016-01-09 07:51:50 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool f2fs_time_over(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
2017-10-19 17:52:47 +08:00
|
|
|
unsigned long interval = sbi->interval_time[type] * HZ;
|
2016-01-09 07:51:50 +08:00
|
|
|
|
|
|
|
return time_after(jiffies, sbi->last_time[type] + interval);
|
|
|
|
}
|
|
|
|
|
2018-09-19 16:48:47 +08:00
|
|
|
static inline unsigned int f2fs_time_to_wait(struct f2fs_sb_info *sbi,
|
|
|
|
int type)
|
|
|
|
{
|
|
|
|
unsigned long interval = sbi->interval_time[type] * HZ;
|
|
|
|
unsigned int wait_ms = 0;
|
|
|
|
long delta;
|
|
|
|
|
|
|
|
delta = (sbi->last_time[type] + interval) - jiffies;
|
|
|
|
if (delta > 0)
|
|
|
|
wait_ms = jiffies_to_msecs(delta);
|
|
|
|
|
|
|
|
return wait_ms;
|
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
/*
|
|
|
|
* Inline functions
|
|
|
|
*/
|
2017-11-30 19:28:21 +08:00
|
|
|
static inline u32 __f2fs_crc32(struct f2fs_sb_info *sbi, u32 crc,
|
2017-07-31 20:19:09 +08:00
|
|
|
const void *address, unsigned int length)
|
|
|
|
{
|
|
|
|
struct {
|
|
|
|
struct shash_desc shash;
|
|
|
|
char ctx[4];
|
|
|
|
} desc;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
BUG_ON(crypto_shash_descsize(sbi->s_chksum_driver) != sizeof(desc.ctx));
|
|
|
|
|
|
|
|
desc.shash.tfm = sbi->s_chksum_driver;
|
|
|
|
*(u32 *)desc.ctx = crc;
|
|
|
|
|
|
|
|
err = crypto_shash_update(&desc.shash, address, length);
|
|
|
|
BUG_ON(err);
|
|
|
|
|
|
|
|
return *(u32 *)desc.ctx;
|
|
|
|
}
|
|
|
|
|
2017-11-30 19:28:21 +08:00
|
|
|
static inline u32 f2fs_crc32(struct f2fs_sb_info *sbi, const void *address,
|
|
|
|
unsigned int length)
|
|
|
|
{
|
|
|
|
return __f2fs_crc32(sbi, F2FS_SUPER_MAGIC, address, length);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool f2fs_crc_valid(struct f2fs_sb_info *sbi, __u32 blk_crc,
|
|
|
|
void *buf, size_t buf_size)
|
|
|
|
{
|
|
|
|
return f2fs_crc32(sbi, buf, buf_size) == blk_crc;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline u32 f2fs_chksum(struct f2fs_sb_info *sbi, u32 crc,
|
|
|
|
const void *address, unsigned int length)
|
|
|
|
{
|
|
|
|
return __f2fs_crc32(sbi, crc, address, length);
|
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
static inline struct f2fs_inode_info *F2FS_I(struct inode *inode)
|
|
|
|
{
|
|
|
|
return container_of(inode, struct f2fs_inode_info, vfs_inode);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct f2fs_sb_info *F2FS_SB(struct super_block *sb)
|
|
|
|
{
|
|
|
|
return sb->s_fs_info;
|
|
|
|
}
|
|
|
|
|
2014-09-03 06:31:18 +08:00
|
|
|
static inline struct f2fs_sb_info *F2FS_I_SB(struct inode *inode)
|
|
|
|
{
|
|
|
|
return F2FS_SB(inode->i_sb);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct f2fs_sb_info *F2FS_M_SB(struct address_space *mapping)
|
|
|
|
{
|
|
|
|
return F2FS_I_SB(mapping->host);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct f2fs_sb_info *F2FS_P_SB(struct page *page)
|
|
|
|
{
|
2019-07-02 10:15:29 +08:00
|
|
|
return F2FS_M_SB(page_file_mapping(page));
|
2014-09-03 06:31:18 +08:00
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
static inline struct f2fs_super_block *F2FS_RAW_SUPER(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return (struct f2fs_super_block *)(sbi->raw_super);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct f2fs_checkpoint *F2FS_CKPT(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return (struct f2fs_checkpoint *)(sbi->ckpt);
|
|
|
|
}
|
|
|
|
|
2013-07-15 17:57:38 +08:00
|
|
|
static inline struct f2fs_node *F2FS_NODE(struct page *page)
|
|
|
|
{
|
|
|
|
return (struct f2fs_node *)page_address(page);
|
|
|
|
}
|
|
|
|
|
2013-12-26 15:30:41 +08:00
|
|
|
static inline struct f2fs_inode *F2FS_INODE(struct page *page)
|
|
|
|
{
|
|
|
|
return &((struct f2fs_node *)page_address(page))->i;
|
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
static inline struct f2fs_nm_info *NM_I(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return (struct f2fs_nm_info *)(sbi->nm_info);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct f2fs_sm_info *SM_I(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return (struct f2fs_sm_info *)(sbi->sm_info);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct sit_info *SIT_I(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return (struct sit_info *)(SM_I(sbi)->sit_info);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct free_segmap_info *FREE_I(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return (struct free_segmap_info *)(SM_I(sbi)->free_info);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct dirty_seglist_info *DIRTY_I(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return (struct dirty_seglist_info *)(SM_I(sbi)->dirty_info);
|
|
|
|
}
|
|
|
|
|
2014-01-20 18:37:04 +08:00
|
|
|
static inline struct address_space *META_MAPPING(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return sbi->meta_inode->i_mapping;
|
|
|
|
}
|
|
|
|
|
2014-01-21 17:51:16 +08:00
|
|
|
static inline struct address_space *NODE_MAPPING(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return sbi->node_inode->i_mapping;
|
|
|
|
}
|
|
|
|
|
2015-01-28 17:48:42 +08:00
|
|
|
static inline bool is_sbi_flag_set(struct f2fs_sb_info *sbi, unsigned int type)
|
|
|
|
{
|
2016-09-20 10:29:47 +08:00
|
|
|
return test_bit(type, &sbi->s_flag);
|
2015-01-28 17:48:42 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void set_sbi_flag(struct f2fs_sb_info *sbi, unsigned int type)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2016-09-20 10:29:47 +08:00
|
|
|
set_bit(type, &sbi->s_flag);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2015-01-28 17:48:42 +08:00
|
|
|
static inline void clear_sbi_flag(struct f2fs_sb_info *sbi, unsigned int type)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2016-09-20 10:29:47 +08:00
|
|
|
clear_bit(type, &sbi->s_flag);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2013-08-09 14:03:21 +08:00
|
|
|
static inline unsigned long long cur_cp_version(struct f2fs_checkpoint *cp)
|
|
|
|
{
|
|
|
|
return le64_to_cpu(cp->checkpoint_ver);
|
|
|
|
}
|
|
|
|
|
2017-10-07 00:14:28 +08:00
|
|
|
static inline unsigned long f2fs_qf_ino(struct super_block *sb, int type)
|
|
|
|
{
|
|
|
|
if (type < F2FS_MAX_QUOTAS)
|
|
|
|
return le32_to_cpu(F2FS_SB(sb)->raw_super->qf_ino[type]);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-02-25 19:53:39 +08:00
|
|
|
static inline __u64 cur_cp_crc(struct f2fs_checkpoint *cp)
|
|
|
|
{
|
|
|
|
size_t crc_offset = le32_to_cpu(cp->checksum_offset);
|
|
|
|
return le32_to_cpu(*((__le32 *)((unsigned char *)cp + crc_offset)));
|
|
|
|
}
|
|
|
|
|
2016-09-20 11:04:18 +08:00
|
|
|
static inline bool __is_set_ckpt_flags(struct f2fs_checkpoint *cp, unsigned int f)
|
2012-11-28 15:12:41 +08:00
|
|
|
{
|
|
|
|
unsigned int ckpt_flags = le32_to_cpu(cp->ckpt_flags);
|
2016-09-20 11:04:18 +08:00
|
|
|
|
2012-11-28 15:12:41 +08:00
|
|
|
return ckpt_flags & f;
|
|
|
|
}
|
|
|
|
|
2016-09-20 11:04:18 +08:00
|
|
|
static inline bool is_set_ckpt_flags(struct f2fs_sb_info *sbi, unsigned int f)
|
2012-11-28 15:12:41 +08:00
|
|
|
{
|
2016-09-20 11:04:18 +08:00
|
|
|
return __is_set_ckpt_flags(F2FS_CKPT(sbi), f);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void __set_ckpt_flags(struct f2fs_checkpoint *cp, unsigned int f)
|
|
|
|
{
|
|
|
|
unsigned int ckpt_flags;
|
|
|
|
|
|
|
|
ckpt_flags = le32_to_cpu(cp->ckpt_flags);
|
2012-11-28 15:12:41 +08:00
|
|
|
ckpt_flags |= f;
|
|
|
|
cp->ckpt_flags = cpu_to_le32(ckpt_flags);
|
|
|
|
}
|
|
|
|
|
2016-09-20 11:04:18 +08:00
|
|
|
static inline void set_ckpt_flags(struct f2fs_sb_info *sbi, unsigned int f)
|
2012-11-28 15:12:41 +08:00
|
|
|
{
|
f2fs: use spin_{,un}lock_irq{save,restore}
generic/361 reports below warning, this is because: once, there is
someone entering into critical region of sbi.cp_lock, if write_end_io.
f2fs_stop_checkpoint is invoked from an triggered IRQ, we will encounter
deadlock.
So this patch changes to use spin_{,un}lock_irq{save,restore} to create
critical region without IRQ enabled to avoid potential deadlock.
irq event stamp: 83391573
loop: Write error at byte offset 438729728, length 1024.
hardirqs last enabled at (83391573): [<c1809752>] restore_all+0xf/0x65
hardirqs last disabled at (83391572): [<c1809eac>] reschedule_interrupt+0x30/0x3c
loop: Write error at byte offset 438860288, length 1536.
softirqs last enabled at (83389244): [<c180cc4e>] __do_softirq+0x1ae/0x476
softirqs last disabled at (83389237): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
loop: Write error at byte offset 438990848, length 2048.
================================
WARNING: inconsistent lock state
4.12.0-rc2+ #30 Tainted: G O
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
xfs_io/7959 [HC1[1]:SC0[0]:HE0:SE1] takes:
(&(&sbi->cp_lock)->rlock){?.+...}, at: [<f96f96cc>] f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
{HARDIRQ-ON-W} state was registered at:
__lock_acquire+0x527/0x7b0
lock_acquire+0xae/0x220
_raw_spin_lock+0x42/0x50
do_checkpoint+0x165/0x9e0 [f2fs]
write_checkpoint+0x33f/0x740 [f2fs]
__f2fs_sync_fs+0x92/0x1f0 [f2fs]
f2fs_sync_fs+0x12/0x20 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x85
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
irq event stamp: 1957420
hardirqs last enabled at (1957419): [<c1808f37>] _raw_spin_unlock_irq+0x27/0x50
hardirqs last disabled at (1957420): [<c1809f9c>] call_function_single_interrupt+0x30/0x3c
softirqs last enabled at (1953784): [<c180cc4e>] __do_softirq+0x1ae/0x476
softirqs last disabled at (1953773): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&sbi->cp_lock)->rlock);
<Interrupt>
lock(&(&sbi->cp_lock)->rlock);
*** DEADLOCK ***
2 locks held by xfs_io/7959:
#0: (sb_writers#13){.+.+.+}, at: [<c11fd7ca>] vfs_write+0x16a/0x190
#1: (&sb->s_type->i_mutex_key#16){+.+.+.}, at: [<f96e33f5>] f2fs_file_write_iter+0x25/0x140 [f2fs]
stack backtrace:
CPU: 2 PID: 7959 Comm: xfs_io Tainted: G O 4.12.0-rc2+ #30
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_usage_bug+0x1d3/0x1dd
? check_usage_backwards+0xe0/0xe0
mark_lock+0x23d/0x280
__lock_acquire+0x699/0x7b0
? __this_cpu_preempt_check+0xf/0x20
? trace_hardirqs_off_caller+0x91/0xe0
lock_acquire+0xae/0x220
? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
_raw_spin_lock+0x42/0x50
? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
f2fs_write_end_io+0x147/0x150 [f2fs]
bio_endio+0x7a/0x1e0
blk_update_request+0xad/0x410
blk_mq_end_request+0x16/0x60
lo_complete_rq+0x3c/0x70
__blk_mq_complete_request_remote+0x11/0x20
flush_smp_call_function_queue+0x6d/0x120
? debug_smp_processor_id+0x12/0x20
generic_smp_call_function_single_interrupt+0x12/0x30
smp_call_function_single_interrupt+0x25/0x40
call_function_single_interrupt+0x37/0x3c
EIP: _raw_spin_unlock_irq+0x2d/0x50
EFLAGS: 00000296 CPU: 2
EAX: 00000001 EBX: d2ccc51c ECX: 00000001 EDX: c1aacebd
ESI: 00000000 EDI: 00000000 EBP: c96c9d1c ESP: c96c9d18
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
? inherit_task_group.isra.98.part.99+0x6b/0xb0
__add_to_page_cache_locked+0x1d4/0x290
add_to_page_cache_lru+0x38/0xb0
pagecache_get_page+0x8e/0x200
f2fs_write_begin+0x96/0xf00 [f2fs]
? trace_hardirqs_on_caller+0xdd/0x1c0
? current_time+0x17/0x50
? trace_hardirqs_on+0xb/0x10
generic_perform_write+0xa9/0x170
__generic_file_write_iter+0x1a2/0x1f0
? f2fs_preallocate_blocks+0x137/0x160 [f2fs]
f2fs_file_write_iter+0x6e/0x140 [f2fs]
? __lock_acquire+0x429/0x7b0
__vfs_write+0xc1/0x140
vfs_write+0x9b/0x190
SyS_pwrite64+0x63/0xa0
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
EIP: 0xb7786c61
EFLAGS: 00000293 CPU: 2
EAX: ffffffda EBX: 00000003 ECX: 08416000 EDX: 00001000
ESI: 18b24000 EDI: 00000000 EBP: 00000003 ESP: bf9b36b0
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
Fixes: aaec2b1d1879 ("f2fs: introduce cp_lock to protect updating of ckpt_flags")
Cc: stable@vger.kernel.org
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-07 14:10:15 +08:00
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
spin_lock_irqsave(&sbi->cp_lock, flags);
|
2016-09-20 11:04:18 +08:00
|
|
|
__set_ckpt_flags(F2FS_CKPT(sbi), f);
|
f2fs: use spin_{,un}lock_irq{save,restore}
generic/361 reports below warning, this is because: once, there is
someone entering into critical region of sbi.cp_lock, if write_end_io.
f2fs_stop_checkpoint is invoked from an triggered IRQ, we will encounter
deadlock.
So this patch changes to use spin_{,un}lock_irq{save,restore} to create
critical region without IRQ enabled to avoid potential deadlock.
irq event stamp: 83391573
loop: Write error at byte offset 438729728, length 1024.
hardirqs last enabled at (83391573): [<c1809752>] restore_all+0xf/0x65
hardirqs last disabled at (83391572): [<c1809eac>] reschedule_interrupt+0x30/0x3c
loop: Write error at byte offset 438860288, length 1536.
softirqs last enabled at (83389244): [<c180cc4e>] __do_softirq+0x1ae/0x476
softirqs last disabled at (83389237): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
loop: Write error at byte offset 438990848, length 2048.
================================
WARNING: inconsistent lock state
4.12.0-rc2+ #30 Tainted: G O
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
xfs_io/7959 [HC1[1]:SC0[0]:HE0:SE1] takes:
(&(&sbi->cp_lock)->rlock){?.+...}, at: [<f96f96cc>] f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
{HARDIRQ-ON-W} state was registered at:
__lock_acquire+0x527/0x7b0
lock_acquire+0xae/0x220
_raw_spin_lock+0x42/0x50
do_checkpoint+0x165/0x9e0 [f2fs]
write_checkpoint+0x33f/0x740 [f2fs]
__f2fs_sync_fs+0x92/0x1f0 [f2fs]
f2fs_sync_fs+0x12/0x20 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x85
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
irq event stamp: 1957420
hardirqs last enabled at (1957419): [<c1808f37>] _raw_spin_unlock_irq+0x27/0x50
hardirqs last disabled at (1957420): [<c1809f9c>] call_function_single_interrupt+0x30/0x3c
softirqs last enabled at (1953784): [<c180cc4e>] __do_softirq+0x1ae/0x476
softirqs last disabled at (1953773): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&sbi->cp_lock)->rlock);
<Interrupt>
lock(&(&sbi->cp_lock)->rlock);
*** DEADLOCK ***
2 locks held by xfs_io/7959:
#0: (sb_writers#13){.+.+.+}, at: [<c11fd7ca>] vfs_write+0x16a/0x190
#1: (&sb->s_type->i_mutex_key#16){+.+.+.}, at: [<f96e33f5>] f2fs_file_write_iter+0x25/0x140 [f2fs]
stack backtrace:
CPU: 2 PID: 7959 Comm: xfs_io Tainted: G O 4.12.0-rc2+ #30
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_usage_bug+0x1d3/0x1dd
? check_usage_backwards+0xe0/0xe0
mark_lock+0x23d/0x280
__lock_acquire+0x699/0x7b0
? __this_cpu_preempt_check+0xf/0x20
? trace_hardirqs_off_caller+0x91/0xe0
lock_acquire+0xae/0x220
? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
_raw_spin_lock+0x42/0x50
? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
f2fs_write_end_io+0x147/0x150 [f2fs]
bio_endio+0x7a/0x1e0
blk_update_request+0xad/0x410
blk_mq_end_request+0x16/0x60
lo_complete_rq+0x3c/0x70
__blk_mq_complete_request_remote+0x11/0x20
flush_smp_call_function_queue+0x6d/0x120
? debug_smp_processor_id+0x12/0x20
generic_smp_call_function_single_interrupt+0x12/0x30
smp_call_function_single_interrupt+0x25/0x40
call_function_single_interrupt+0x37/0x3c
EIP: _raw_spin_unlock_irq+0x2d/0x50
EFLAGS: 00000296 CPU: 2
EAX: 00000001 EBX: d2ccc51c ECX: 00000001 EDX: c1aacebd
ESI: 00000000 EDI: 00000000 EBP: c96c9d1c ESP: c96c9d18
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
? inherit_task_group.isra.98.part.99+0x6b/0xb0
__add_to_page_cache_locked+0x1d4/0x290
add_to_page_cache_lru+0x38/0xb0
pagecache_get_page+0x8e/0x200
f2fs_write_begin+0x96/0xf00 [f2fs]
? trace_hardirqs_on_caller+0xdd/0x1c0
? current_time+0x17/0x50
? trace_hardirqs_on+0xb/0x10
generic_perform_write+0xa9/0x170
__generic_file_write_iter+0x1a2/0x1f0
? f2fs_preallocate_blocks+0x137/0x160 [f2fs]
f2fs_file_write_iter+0x6e/0x140 [f2fs]
? __lock_acquire+0x429/0x7b0
__vfs_write+0xc1/0x140
vfs_write+0x9b/0x190
SyS_pwrite64+0x63/0xa0
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
EIP: 0xb7786c61
EFLAGS: 00000293 CPU: 2
EAX: ffffffda EBX: 00000003 ECX: 08416000 EDX: 00001000
ESI: 18b24000 EDI: 00000000 EBP: 00000003 ESP: bf9b36b0
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
Fixes: aaec2b1d1879 ("f2fs: introduce cp_lock to protect updating of ckpt_flags")
Cc: stable@vger.kernel.org
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-07 14:10:15 +08:00
|
|
|
spin_unlock_irqrestore(&sbi->cp_lock, flags);
|
2016-09-20 11:04:18 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void __clear_ckpt_flags(struct f2fs_checkpoint *cp, unsigned int f)
|
|
|
|
{
|
|
|
|
unsigned int ckpt_flags;
|
|
|
|
|
|
|
|
ckpt_flags = le32_to_cpu(cp->ckpt_flags);
|
2012-11-28 15:12:41 +08:00
|
|
|
ckpt_flags &= (~f);
|
|
|
|
cp->ckpt_flags = cpu_to_le32(ckpt_flags);
|
|
|
|
}
|
|
|
|
|
2016-09-20 11:04:18 +08:00
|
|
|
static inline void clear_ckpt_flags(struct f2fs_sb_info *sbi, unsigned int f)
|
|
|
|
{
|
f2fs: use spin_{,un}lock_irq{save,restore}
generic/361 reports below warning, this is because: once, there is
someone entering into critical region of sbi.cp_lock, if write_end_io.
f2fs_stop_checkpoint is invoked from an triggered IRQ, we will encounter
deadlock.
So this patch changes to use spin_{,un}lock_irq{save,restore} to create
critical region without IRQ enabled to avoid potential deadlock.
irq event stamp: 83391573
loop: Write error at byte offset 438729728, length 1024.
hardirqs last enabled at (83391573): [<c1809752>] restore_all+0xf/0x65
hardirqs last disabled at (83391572): [<c1809eac>] reschedule_interrupt+0x30/0x3c
loop: Write error at byte offset 438860288, length 1536.
softirqs last enabled at (83389244): [<c180cc4e>] __do_softirq+0x1ae/0x476
softirqs last disabled at (83389237): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
loop: Write error at byte offset 438990848, length 2048.
================================
WARNING: inconsistent lock state
4.12.0-rc2+ #30 Tainted: G O
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
xfs_io/7959 [HC1[1]:SC0[0]:HE0:SE1] takes:
(&(&sbi->cp_lock)->rlock){?.+...}, at: [<f96f96cc>] f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
{HARDIRQ-ON-W} state was registered at:
__lock_acquire+0x527/0x7b0
lock_acquire+0xae/0x220
_raw_spin_lock+0x42/0x50
do_checkpoint+0x165/0x9e0 [f2fs]
write_checkpoint+0x33f/0x740 [f2fs]
__f2fs_sync_fs+0x92/0x1f0 [f2fs]
f2fs_sync_fs+0x12/0x20 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x85
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
irq event stamp: 1957420
hardirqs last enabled at (1957419): [<c1808f37>] _raw_spin_unlock_irq+0x27/0x50
hardirqs last disabled at (1957420): [<c1809f9c>] call_function_single_interrupt+0x30/0x3c
softirqs last enabled at (1953784): [<c180cc4e>] __do_softirq+0x1ae/0x476
softirqs last disabled at (1953773): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&sbi->cp_lock)->rlock);
<Interrupt>
lock(&(&sbi->cp_lock)->rlock);
*** DEADLOCK ***
2 locks held by xfs_io/7959:
#0: (sb_writers#13){.+.+.+}, at: [<c11fd7ca>] vfs_write+0x16a/0x190
#1: (&sb->s_type->i_mutex_key#16){+.+.+.}, at: [<f96e33f5>] f2fs_file_write_iter+0x25/0x140 [f2fs]
stack backtrace:
CPU: 2 PID: 7959 Comm: xfs_io Tainted: G O 4.12.0-rc2+ #30
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_usage_bug+0x1d3/0x1dd
? check_usage_backwards+0xe0/0xe0
mark_lock+0x23d/0x280
__lock_acquire+0x699/0x7b0
? __this_cpu_preempt_check+0xf/0x20
? trace_hardirqs_off_caller+0x91/0xe0
lock_acquire+0xae/0x220
? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
_raw_spin_lock+0x42/0x50
? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
f2fs_write_end_io+0x147/0x150 [f2fs]
bio_endio+0x7a/0x1e0
blk_update_request+0xad/0x410
blk_mq_end_request+0x16/0x60
lo_complete_rq+0x3c/0x70
__blk_mq_complete_request_remote+0x11/0x20
flush_smp_call_function_queue+0x6d/0x120
? debug_smp_processor_id+0x12/0x20
generic_smp_call_function_single_interrupt+0x12/0x30
smp_call_function_single_interrupt+0x25/0x40
call_function_single_interrupt+0x37/0x3c
EIP: _raw_spin_unlock_irq+0x2d/0x50
EFLAGS: 00000296 CPU: 2
EAX: 00000001 EBX: d2ccc51c ECX: 00000001 EDX: c1aacebd
ESI: 00000000 EDI: 00000000 EBP: c96c9d1c ESP: c96c9d18
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
? inherit_task_group.isra.98.part.99+0x6b/0xb0
__add_to_page_cache_locked+0x1d4/0x290
add_to_page_cache_lru+0x38/0xb0
pagecache_get_page+0x8e/0x200
f2fs_write_begin+0x96/0xf00 [f2fs]
? trace_hardirqs_on_caller+0xdd/0x1c0
? current_time+0x17/0x50
? trace_hardirqs_on+0xb/0x10
generic_perform_write+0xa9/0x170
__generic_file_write_iter+0x1a2/0x1f0
? f2fs_preallocate_blocks+0x137/0x160 [f2fs]
f2fs_file_write_iter+0x6e/0x140 [f2fs]
? __lock_acquire+0x429/0x7b0
__vfs_write+0xc1/0x140
vfs_write+0x9b/0x190
SyS_pwrite64+0x63/0xa0
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
EIP: 0xb7786c61
EFLAGS: 00000293 CPU: 2
EAX: ffffffda EBX: 00000003 ECX: 08416000 EDX: 00001000
ESI: 18b24000 EDI: 00000000 EBP: 00000003 ESP: bf9b36b0
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
Fixes: aaec2b1d1879 ("f2fs: introduce cp_lock to protect updating of ckpt_flags")
Cc: stable@vger.kernel.org
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-07 14:10:15 +08:00
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
spin_lock_irqsave(&sbi->cp_lock, flags);
|
2016-09-20 11:04:18 +08:00
|
|
|
__clear_ckpt_flags(F2FS_CKPT(sbi), f);
|
f2fs: use spin_{,un}lock_irq{save,restore}
generic/361 reports below warning, this is because: once, there is
someone entering into critical region of sbi.cp_lock, if write_end_io.
f2fs_stop_checkpoint is invoked from an triggered IRQ, we will encounter
deadlock.
So this patch changes to use spin_{,un}lock_irq{save,restore} to create
critical region without IRQ enabled to avoid potential deadlock.
irq event stamp: 83391573
loop: Write error at byte offset 438729728, length 1024.
hardirqs last enabled at (83391573): [<c1809752>] restore_all+0xf/0x65
hardirqs last disabled at (83391572): [<c1809eac>] reschedule_interrupt+0x30/0x3c
loop: Write error at byte offset 438860288, length 1536.
softirqs last enabled at (83389244): [<c180cc4e>] __do_softirq+0x1ae/0x476
softirqs last disabled at (83389237): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
loop: Write error at byte offset 438990848, length 2048.
================================
WARNING: inconsistent lock state
4.12.0-rc2+ #30 Tainted: G O
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
xfs_io/7959 [HC1[1]:SC0[0]:HE0:SE1] takes:
(&(&sbi->cp_lock)->rlock){?.+...}, at: [<f96f96cc>] f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
{HARDIRQ-ON-W} state was registered at:
__lock_acquire+0x527/0x7b0
lock_acquire+0xae/0x220
_raw_spin_lock+0x42/0x50
do_checkpoint+0x165/0x9e0 [f2fs]
write_checkpoint+0x33f/0x740 [f2fs]
__f2fs_sync_fs+0x92/0x1f0 [f2fs]
f2fs_sync_fs+0x12/0x20 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x85
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
irq event stamp: 1957420
hardirqs last enabled at (1957419): [<c1808f37>] _raw_spin_unlock_irq+0x27/0x50
hardirqs last disabled at (1957420): [<c1809f9c>] call_function_single_interrupt+0x30/0x3c
softirqs last enabled at (1953784): [<c180cc4e>] __do_softirq+0x1ae/0x476
softirqs last disabled at (1953773): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&sbi->cp_lock)->rlock);
<Interrupt>
lock(&(&sbi->cp_lock)->rlock);
*** DEADLOCK ***
2 locks held by xfs_io/7959:
#0: (sb_writers#13){.+.+.+}, at: [<c11fd7ca>] vfs_write+0x16a/0x190
#1: (&sb->s_type->i_mutex_key#16){+.+.+.}, at: [<f96e33f5>] f2fs_file_write_iter+0x25/0x140 [f2fs]
stack backtrace:
CPU: 2 PID: 7959 Comm: xfs_io Tainted: G O 4.12.0-rc2+ #30
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_usage_bug+0x1d3/0x1dd
? check_usage_backwards+0xe0/0xe0
mark_lock+0x23d/0x280
__lock_acquire+0x699/0x7b0
? __this_cpu_preempt_check+0xf/0x20
? trace_hardirqs_off_caller+0x91/0xe0
lock_acquire+0xae/0x220
? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
_raw_spin_lock+0x42/0x50
? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
f2fs_write_end_io+0x147/0x150 [f2fs]
bio_endio+0x7a/0x1e0
blk_update_request+0xad/0x410
blk_mq_end_request+0x16/0x60
lo_complete_rq+0x3c/0x70
__blk_mq_complete_request_remote+0x11/0x20
flush_smp_call_function_queue+0x6d/0x120
? debug_smp_processor_id+0x12/0x20
generic_smp_call_function_single_interrupt+0x12/0x30
smp_call_function_single_interrupt+0x25/0x40
call_function_single_interrupt+0x37/0x3c
EIP: _raw_spin_unlock_irq+0x2d/0x50
EFLAGS: 00000296 CPU: 2
EAX: 00000001 EBX: d2ccc51c ECX: 00000001 EDX: c1aacebd
ESI: 00000000 EDI: 00000000 EBP: c96c9d1c ESP: c96c9d18
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
? inherit_task_group.isra.98.part.99+0x6b/0xb0
__add_to_page_cache_locked+0x1d4/0x290
add_to_page_cache_lru+0x38/0xb0
pagecache_get_page+0x8e/0x200
f2fs_write_begin+0x96/0xf00 [f2fs]
? trace_hardirqs_on_caller+0xdd/0x1c0
? current_time+0x17/0x50
? trace_hardirqs_on+0xb/0x10
generic_perform_write+0xa9/0x170
__generic_file_write_iter+0x1a2/0x1f0
? f2fs_preallocate_blocks+0x137/0x160 [f2fs]
f2fs_file_write_iter+0x6e/0x140 [f2fs]
? __lock_acquire+0x429/0x7b0
__vfs_write+0xc1/0x140
vfs_write+0x9b/0x190
SyS_pwrite64+0x63/0xa0
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
EIP: 0xb7786c61
EFLAGS: 00000293 CPU: 2
EAX: ffffffda EBX: 00000003 ECX: 08416000 EDX: 00001000
ESI: 18b24000 EDI: 00000000 EBP: 00000003 ESP: bf9b36b0
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
Fixes: aaec2b1d1879 ("f2fs: introduce cp_lock to protect updating of ckpt_flags")
Cc: stable@vger.kernel.org
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-07 14:10:15 +08:00
|
|
|
spin_unlock_irqrestore(&sbi->cp_lock, flags);
|
2016-09-20 11:04:18 +08:00
|
|
|
}
|
|
|
|
|
f2fs: use rw_sem instead of fs_lock(locks mutex)
The fs_locks is used to block other ops(ex, recovery) when doing checkpoint.
And each other operate routine(besides checkpoint) needs to acquire a fs_lock,
there is a terrible problem here, if these are too many concurrency threads acquiring
fs_lock, so that they will block each other and may lead to some performance problem,
but this is not the phenomenon we want to see.
Though there are some optimization patches introduced to enhance the usage of fs_lock,
but the thorough solution is using a *rw_sem* to replace the fs_lock.
Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block
other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other,
this can avoid the problem described above completely.
Because of the weakness of rw_sem, the above change may introduce a potential problem
that the checkpoint thread might get starved if other threads are intensively locking
the read semaphore for I/O.(Pointed out by Xu Jin)
In order to avoid this, a wait_list is introduced, the appending read semaphore ops
will be dropped into the wait_list if checkpoint thread is waiting for write semaphore,
and will be waked up when checkpoint thread gives up write semaphore.
Thanks to Kim's previous review and test, and will be very glad to see other guys'
performance tests about this patch.
V2:
-fix the potential starvation problem.
-use more suitable func name suggested by Xu Jin.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: adjust minor coding standard]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-09-27 18:08:30 +08:00
|
|
|
static inline void f2fs_lock_op(struct f2fs_sb_info *sbi)
|
f2fs: introduce a new global lock scheme
In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.
Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};
In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.
In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.
For this, I propose a new global lock scheme as follows.
0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write
1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.
2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.
3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.
4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.
5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()
Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-22 15:21:29 +08:00
|
|
|
{
|
2016-08-05 02:38:25 +08:00
|
|
|
down_read(&sbi->cp_rwsem);
|
f2fs: introduce a new global lock scheme
In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.
Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};
In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.
In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.
For this, I propose a new global lock scheme as follows.
0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write
1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.
2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.
3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.
4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.
5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()
Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-22 15:21:29 +08:00
|
|
|
}
|
|
|
|
|
2017-05-13 04:51:34 +08:00
|
|
|
static inline int f2fs_trylock_op(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2021-12-12 17:17:51 +08:00
|
|
|
if (time_to_inject(sbi, FAULT_LOCK_OP)) {
|
|
|
|
f2fs_show_injection_info(sbi, FAULT_LOCK_OP);
|
|
|
|
return 0;
|
|
|
|
}
|
2017-05-13 04:51:34 +08:00
|
|
|
return down_read_trylock(&sbi->cp_rwsem);
|
|
|
|
}
|
|
|
|
|
f2fs: use rw_sem instead of fs_lock(locks mutex)
The fs_locks is used to block other ops(ex, recovery) when doing checkpoint.
And each other operate routine(besides checkpoint) needs to acquire a fs_lock,
there is a terrible problem here, if these are too many concurrency threads acquiring
fs_lock, so that they will block each other and may lead to some performance problem,
but this is not the phenomenon we want to see.
Though there are some optimization patches introduced to enhance the usage of fs_lock,
but the thorough solution is using a *rw_sem* to replace the fs_lock.
Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block
other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other,
this can avoid the problem described above completely.
Because of the weakness of rw_sem, the above change may introduce a potential problem
that the checkpoint thread might get starved if other threads are intensively locking
the read semaphore for I/O.(Pointed out by Xu Jin)
In order to avoid this, a wait_list is introduced, the appending read semaphore ops
will be dropped into the wait_list if checkpoint thread is waiting for write semaphore,
and will be waked up when checkpoint thread gives up write semaphore.
Thanks to Kim's previous review and test, and will be very glad to see other guys'
performance tests about this patch.
V2:
-fix the potential starvation problem.
-use more suitable func name suggested by Xu Jin.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: adjust minor coding standard]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-09-27 18:08:30 +08:00
|
|
|
static inline void f2fs_unlock_op(struct f2fs_sb_info *sbi)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2016-08-05 02:38:25 +08:00
|
|
|
up_read(&sbi->cp_rwsem);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
f2fs: use rw_sem instead of fs_lock(locks mutex)
The fs_locks is used to block other ops(ex, recovery) when doing checkpoint.
And each other operate routine(besides checkpoint) needs to acquire a fs_lock,
there is a terrible problem here, if these are too many concurrency threads acquiring
fs_lock, so that they will block each other and may lead to some performance problem,
but this is not the phenomenon we want to see.
Though there are some optimization patches introduced to enhance the usage of fs_lock,
but the thorough solution is using a *rw_sem* to replace the fs_lock.
Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block
other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other,
this can avoid the problem described above completely.
Because of the weakness of rw_sem, the above change may introduce a potential problem
that the checkpoint thread might get starved if other threads are intensively locking
the read semaphore for I/O.(Pointed out by Xu Jin)
In order to avoid this, a wait_list is introduced, the appending read semaphore ops
will be dropped into the wait_list if checkpoint thread is waiting for write semaphore,
and will be waked up when checkpoint thread gives up write semaphore.
Thanks to Kim's previous review and test, and will be very glad to see other guys'
performance tests about this patch.
V2:
-fix the potential starvation problem.
-use more suitable func name suggested by Xu Jin.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: adjust minor coding standard]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-09-27 18:08:30 +08:00
|
|
|
static inline void f2fs_lock_all(struct f2fs_sb_info *sbi)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2016-08-05 02:38:25 +08:00
|
|
|
down_write(&sbi->cp_rwsem);
|
f2fs: introduce a new global lock scheme
In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.
Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};
In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.
In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.
For this, I propose a new global lock scheme as follows.
0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write
1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.
2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.
3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.
4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.
5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()
Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-22 15:21:29 +08:00
|
|
|
}
|
|
|
|
|
f2fs: use rw_sem instead of fs_lock(locks mutex)
The fs_locks is used to block other ops(ex, recovery) when doing checkpoint.
And each other operate routine(besides checkpoint) needs to acquire a fs_lock,
there is a terrible problem here, if these are too many concurrency threads acquiring
fs_lock, so that they will block each other and may lead to some performance problem,
but this is not the phenomenon we want to see.
Though there are some optimization patches introduced to enhance the usage of fs_lock,
but the thorough solution is using a *rw_sem* to replace the fs_lock.
Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block
other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other,
this can avoid the problem described above completely.
Because of the weakness of rw_sem, the above change may introduce a potential problem
that the checkpoint thread might get starved if other threads are intensively locking
the read semaphore for I/O.(Pointed out by Xu Jin)
In order to avoid this, a wait_list is introduced, the appending read semaphore ops
will be dropped into the wait_list if checkpoint thread is waiting for write semaphore,
and will be waked up when checkpoint thread gives up write semaphore.
Thanks to Kim's previous review and test, and will be very glad to see other guys'
performance tests about this patch.
V2:
-fix the potential starvation problem.
-use more suitable func name suggested by Xu Jin.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: adjust minor coding standard]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-09-27 18:08:30 +08:00
|
|
|
static inline void f2fs_unlock_all(struct f2fs_sb_info *sbi)
|
f2fs: introduce a new global lock scheme
In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.
Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};
In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.
In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.
For this, I propose a new global lock scheme as follows.
0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write
1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.
2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.
3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.
4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.
5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()
Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-22 15:21:29 +08:00
|
|
|
{
|
2016-08-05 02:38:25 +08:00
|
|
|
up_write(&sbi->cp_rwsem);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2015-01-30 03:45:33 +08:00
|
|
|
static inline int __get_cp_reason(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
int reason = CP_SYNC;
|
|
|
|
|
|
|
|
if (test_opt(sbi, FASTBOOT))
|
|
|
|
reason = CP_FASTBOOT;
|
|
|
|
if (is_sbi_flag_set(sbi, SBI_IS_CLOSE))
|
|
|
|
reason = CP_UMOUNT;
|
|
|
|
return reason;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool __remain_node_summaries(int reason)
|
|
|
|
{
|
2017-04-27 20:40:39 +08:00
|
|
|
return (reason & (CP_UMOUNT | CP_FASTBOOT));
|
2015-01-30 03:45:33 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool __exist_node_summaries(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2016-09-20 11:04:18 +08:00
|
|
|
return (is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG) ||
|
|
|
|
is_set_ckpt_flags(sbi, CP_FASTBOOT_FLAG));
|
2015-01-30 03:45:33 +08:00
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
/*
|
|
|
|
* Check whether the inode has blocks or not
|
|
|
|
*/
|
|
|
|
static inline int F2FS_HAS_BLOCKS(struct inode *inode)
|
|
|
|
{
|
2017-06-14 23:00:56 +08:00
|
|
|
block_t xattr_block = F2FS_I(inode)->i_xattr_nid ? 1 : 0;
|
|
|
|
|
f2fs: don't count inode block in in-memory inode.i_blocks
Previously, we count all inode consumed blocks including inode block,
xattr block, index block, data block into i_blocks, for other generic
filesystems, they won't count inode block into i_blocks, so for
userspace applications or quota system, they may detect incorrect block
count according to i_blocks value in inode.
This patch changes to count all blocks into inode.i_blocks excluding
inode block, for on-disk i_blocks, we keep counting inode block for
backward compatibility.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-06 01:11:31 +08:00
|
|
|
return (inode->i_blocks >> F2FS_LOG_SECTORS_PER_BLOCK) > xattr_block;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2014-03-17 16:35:06 +08:00
|
|
|
static inline bool f2fs_has_xattr_block(unsigned int ofs)
|
|
|
|
{
|
|
|
|
return ofs == XATTR_NODE_OFFSET;
|
|
|
|
}
|
|
|
|
|
2018-01-06 08:02:36 +08:00
|
|
|
static inline bool __allow_reserved_blocks(struct f2fs_sb_info *sbi,
|
2018-04-21 14:44:59 +08:00
|
|
|
struct inode *inode, bool cap)
|
2018-01-05 13:36:09 +08:00
|
|
|
{
|
2018-01-06 08:02:36 +08:00
|
|
|
if (!inode)
|
|
|
|
return true;
|
2018-01-05 13:36:09 +08:00
|
|
|
if (!test_opt(sbi, RESERVE_ROOT))
|
|
|
|
return false;
|
2018-01-06 08:02:36 +08:00
|
|
|
if (IS_NOQUOTA(inode))
|
|
|
|
return true;
|
2018-03-08 14:22:56 +08:00
|
|
|
if (uid_eq(F2FS_OPTION(sbi).s_resuid, current_fsuid()))
|
2018-01-05 13:36:09 +08:00
|
|
|
return true;
|
2018-03-08 14:22:56 +08:00
|
|
|
if (!gid_eq(F2FS_OPTION(sbi).s_resgid, GLOBAL_ROOT_GID) &&
|
|
|
|
in_group_p(F2FS_OPTION(sbi).s_resgid))
|
2018-01-05 13:36:09 +08:00
|
|
|
return true;
|
2018-04-21 14:44:59 +08:00
|
|
|
if (cap && capable(CAP_SYS_RESOURCE))
|
2018-03-09 12:47:33 +08:00
|
|
|
return true;
|
2018-01-05 13:36:09 +08:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2017-07-09 00:13:07 +08:00
|
|
|
static inline void f2fs_i_blocks_write(struct inode *, block_t, bool, bool);
|
|
|
|
static inline int inc_valid_block_count(struct f2fs_sb_info *sbi,
|
2016-05-09 19:56:30 +08:00
|
|
|
struct inode *inode, blkcnt_t *count)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2017-07-09 00:13:07 +08:00
|
|
|
blkcnt_t diff = 0, release = 0;
|
2017-06-26 16:24:41 +08:00
|
|
|
block_t avail_user_block_count;
|
2017-07-09 00:13:07 +08:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = dquot_reserve_block(inode, *count);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2017-02-25 11:08:28 +08:00
|
|
|
if (time_to_inject(sbi, FAULT_BLOCK)) {
|
2019-11-01 17:53:23 +08:00
|
|
|
f2fs_show_injection_info(sbi, FAULT_BLOCK);
|
2017-07-09 00:13:07 +08:00
|
|
|
release = *count;
|
2019-08-23 17:58:34 +08:00
|
|
|
goto release_quota;
|
2017-02-25 11:08:28 +08:00
|
|
|
}
|
2018-08-14 05:38:06 +08:00
|
|
|
|
2016-07-20 10:20:11 +08:00
|
|
|
/*
|
|
|
|
* let's increase this in prior to actual block count change in order
|
|
|
|
* for f2fs_sync_file to avoid data races when deciding checkpoint.
|
|
|
|
*/
|
|
|
|
percpu_counter_add(&sbi->alloc_valid_block_count, (*count));
|
|
|
|
|
2016-07-01 10:02:06 +08:00
|
|
|
spin_lock(&sbi->stat_lock);
|
|
|
|
sbi->total_valid_block_count += (block_t)(*count);
|
2017-10-27 20:45:05 +08:00
|
|
|
avail_user_block_count = sbi->user_block_count -
|
|
|
|
sbi->current_reserved_blocks;
|
2017-12-28 07:05:52 +08:00
|
|
|
|
2018-04-21 14:44:59 +08:00
|
|
|
if (!__allow_reserved_blocks(sbi, inode, true))
|
2018-03-08 14:22:56 +08:00
|
|
|
avail_user_block_count -= F2FS_OPTION(sbi).root_reserved_blocks;
|
f2fs: fix to reserve space for IO align feature
https://bugzilla.kernel.org/show_bug.cgi?id=204137
With below script, we will hit panic during new segment allocation:
DISK=bingo.img
MOUNT_DIR=/mnt/f2fs
dd if=/dev/zero of=$DISK bs=1M count=105
mkfs.f2fe -a 1 -o 19 -t 1 -z 1 -f -q $DISK
mount -t f2fs $DISK $MOUNT_DIR -o "noinline_dentry,flush_merge,noextent_cache,mode=lfs,io_bits=7,fsync_mode=strict"
for (( i = 0; i < 4096; i++ )); do
name=`head /dev/urandom | tr -dc A-Za-z0-9 | head -c 10`
mkdir $MOUNT_DIR/$name
done
umount $MOUNT_DIR
rm $DISK
--- Core dump ---
Call Trace:
allocate_segment_by_default+0x9d/0x100 [f2fs]
f2fs_allocate_data_block+0x3c0/0x5c0 [f2fs]
do_write_page+0x62/0x110 [f2fs]
f2fs_outplace_write_data+0x43/0xc0 [f2fs]
f2fs_do_write_data_page+0x386/0x560 [f2fs]
__write_data_page+0x706/0x850 [f2fs]
f2fs_write_cache_pages+0x267/0x6a0 [f2fs]
f2fs_write_data_pages+0x19c/0x2e0 [f2fs]
do_writepages+0x1c/0x70
__filemap_fdatawrite_range+0xaa/0xe0
filemap_fdatawrite+0x1f/0x30
f2fs_sync_dirty_inodes+0x74/0x1f0 [f2fs]
block_operations+0xdc/0x350 [f2fs]
f2fs_write_checkpoint+0x104/0x1150 [f2fs]
f2fs_sync_fs+0xa2/0x120 [f2fs]
f2fs_balance_fs_bg+0x33c/0x390 [f2fs]
f2fs_write_node_pages+0x4c/0x1f0 [f2fs]
do_writepages+0x1c/0x70
__writeback_single_inode+0x45/0x320
writeback_sb_inodes+0x273/0x5c0
wb_writeback+0xff/0x2e0
wb_workfn+0xa1/0x370
process_one_work+0x138/0x350
worker_thread+0x4d/0x3d0
kthread+0x109/0x140
ret_from_fork+0x25/0x30
The root cause here is, with IO alignment feature enables, in worst
case, we need F2FS_IO_SIZE() free blocks space for single one 4k write
due to IO alignment feature will fill dummy pages to make IO being
aligned.
So we will easily run out of free segments during non-inline directory's
data writeback, even in process of foreground GC.
In order to fix this issue, I just propose to reserve additional free
space for IO alignment feature to handle worst case of free space usage
ratio during FGGC.
Fixes: 0a595ebaaa6b ("f2fs: support IO alignment for DATA and NODE writes")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-12-11 21:27:36 +08:00
|
|
|
|
|
|
|
if (F2FS_IO_ALIGNED(sbi))
|
|
|
|
avail_user_block_count -= sbi->blocks_per_seg *
|
|
|
|
SM_I(sbi)->additional_reserved_segments;
|
|
|
|
|
2019-05-30 08:49:05 +08:00
|
|
|
if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
|
|
|
|
if (avail_user_block_count > sbi->unusable_block_count)
|
|
|
|
avail_user_block_count -= sbi->unusable_block_count;
|
|
|
|
else
|
|
|
|
avail_user_block_count = 0;
|
|
|
|
}
|
2017-06-26 16:24:41 +08:00
|
|
|
if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) {
|
|
|
|
diff = sbi->total_valid_block_count - avail_user_block_count;
|
2017-12-28 07:05:52 +08:00
|
|
|
if (diff > *count)
|
|
|
|
diff = *count;
|
2016-07-20 10:20:11 +08:00
|
|
|
*count -= diff;
|
2017-07-09 00:13:07 +08:00
|
|
|
release = diff;
|
2017-12-28 07:05:52 +08:00
|
|
|
sbi->total_valid_block_count -= diff;
|
2016-05-09 19:56:30 +08:00
|
|
|
if (!*count) {
|
|
|
|
spin_unlock(&sbi->stat_lock);
|
2017-07-09 00:13:07 +08:00
|
|
|
goto enospc;
|
2016-05-09 19:56:30 +08:00
|
|
|
}
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
spin_unlock(&sbi->stat_lock);
|
2016-05-17 02:06:50 +08:00
|
|
|
|
2018-07-10 11:32:42 +08:00
|
|
|
if (unlikely(release)) {
|
|
|
|
percpu_counter_sub(&sbi->alloc_valid_block_count, release);
|
2017-07-09 00:13:07 +08:00
|
|
|
dquot_release_reservation_block(inode, release);
|
2018-07-10 11:32:42 +08:00
|
|
|
}
|
2017-07-09 00:13:07 +08:00
|
|
|
f2fs_i_blocks_write(inode, *count, true, true);
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
enospc:
|
2018-07-10 11:32:42 +08:00
|
|
|
percpu_counter_sub(&sbi->alloc_valid_block_count, release);
|
2019-08-23 17:58:34 +08:00
|
|
|
release_quota:
|
2017-07-09 00:13:07 +08:00
|
|
|
dquot_release_reservation_block(inode, release);
|
|
|
|
return -ENOSPC;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2019-06-18 17:48:42 +08:00
|
|
|
__printf(2, 3)
|
|
|
|
void f2fs_printk(struct f2fs_sb_info *sbi, const char *fmt, ...);
|
|
|
|
|
|
|
|
#define f2fs_err(sbi, fmt, ...) \
|
|
|
|
f2fs_printk(sbi, KERN_ERR fmt, ##__VA_ARGS__)
|
|
|
|
#define f2fs_warn(sbi, fmt, ...) \
|
|
|
|
f2fs_printk(sbi, KERN_WARNING fmt, ##__VA_ARGS__)
|
|
|
|
#define f2fs_notice(sbi, fmt, ...) \
|
|
|
|
f2fs_printk(sbi, KERN_NOTICE fmt, ##__VA_ARGS__)
|
|
|
|
#define f2fs_info(sbi, fmt, ...) \
|
|
|
|
f2fs_printk(sbi, KERN_INFO fmt, ##__VA_ARGS__)
|
|
|
|
#define f2fs_debug(sbi, fmt, ...) \
|
|
|
|
f2fs_printk(sbi, KERN_DEBUG fmt, ##__VA_ARGS__)
|
|
|
|
|
2013-11-19 18:03:27 +08:00
|
|
|
static inline void dec_valid_block_count(struct f2fs_sb_info *sbi,
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct inode *inode,
|
2017-06-14 23:00:56 +08:00
|
|
|
block_t count)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2017-06-14 23:00:56 +08:00
|
|
|
blkcnt_t sectors = count << F2FS_LOG_SECTORS_PER_BLOCK;
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
spin_lock(&sbi->stat_lock);
|
2014-09-03 06:52:58 +08:00
|
|
|
f2fs_bug_on(sbi, sbi->total_valid_block_count < (block_t) count);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
sbi->total_valid_block_count -= (block_t)count;
|
2017-10-27 20:45:05 +08:00
|
|
|
if (sbi->reserved_blocks &&
|
|
|
|
sbi->current_reserved_blocks < sbi->reserved_blocks)
|
|
|
|
sbi->current_reserved_blocks = min(sbi->reserved_blocks,
|
|
|
|
sbi->current_reserved_blocks + count);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
spin_unlock(&sbi->stat_lock);
|
2019-04-15 15:28:30 +08:00
|
|
|
if (unlikely(inode->i_blocks < sectors)) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_warn(sbi, "Inconsistent i_blocks, ino:%lu, iblocks:%llu, sectors:%llu",
|
|
|
|
inode->i_ino,
|
|
|
|
(unsigned long long)inode->i_blocks,
|
|
|
|
(unsigned long long)sectors);
|
2019-04-15 15:28:30 +08:00
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK);
|
|
|
|
return;
|
|
|
|
}
|
2017-07-09 00:13:07 +08:00
|
|
|
f2fs_i_blocks_write(inode, count, false, true);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void inc_page_count(struct f2fs_sb_info *sbi, int count_type)
|
|
|
|
{
|
2016-10-21 10:09:57 +08:00
|
|
|
atomic_inc(&sbi->nr_pages[count_type]);
|
2016-08-18 17:46:14 +08:00
|
|
|
|
2019-01-10 16:40:12 +08:00
|
|
|
if (count_type == F2FS_DIRTY_DENTS ||
|
|
|
|
count_type == F2FS_DIRTY_NODES ||
|
|
|
|
count_type == F2FS_DIRTY_META ||
|
|
|
|
count_type == F2FS_DIRTY_QDATA ||
|
|
|
|
count_type == F2FS_DIRTY_IMETA)
|
|
|
|
set_sbi_flag(sbi, SBI_IS_DIRTY);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2014-09-13 06:53:45 +08:00
|
|
|
static inline void inode_inc_dirty_pages(struct inode *inode)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2016-12-03 07:11:32 +08:00
|
|
|
atomic_inc(&F2FS_I(inode)->dirty_pages);
|
2015-12-16 13:09:20 +08:00
|
|
|
inc_page_count(F2FS_I_SB(inode), S_ISDIR(inode->i_mode) ?
|
|
|
|
F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA);
|
2017-11-14 09:46:38 +08:00
|
|
|
if (IS_NOQUOTA(inode))
|
|
|
|
inc_page_count(F2FS_I_SB(inode), F2FS_DIRTY_QDATA);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void dec_page_count(struct f2fs_sb_info *sbi, int count_type)
|
|
|
|
{
|
2016-10-21 10:09:57 +08:00
|
|
|
atomic_dec(&sbi->nr_pages[count_type]);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2014-09-13 06:53:45 +08:00
|
|
|
static inline void inode_dec_dirty_pages(struct inode *inode)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2015-06-29 18:14:10 +08:00
|
|
|
if (!S_ISDIR(inode->i_mode) && !S_ISREG(inode->i_mode) &&
|
|
|
|
!S_ISLNK(inode->i_mode))
|
2014-02-07 09:00:06 +08:00
|
|
|
return;
|
|
|
|
|
2016-12-03 07:11:32 +08:00
|
|
|
atomic_dec(&F2FS_I(inode)->dirty_pages);
|
2015-12-16 13:09:20 +08:00
|
|
|
dec_page_count(F2FS_I_SB(inode), S_ISDIR(inode->i_mode) ?
|
|
|
|
F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA);
|
2017-11-14 09:46:38 +08:00
|
|
|
if (IS_NOQUOTA(inode))
|
|
|
|
dec_page_count(F2FS_I_SB(inode), F2FS_DIRTY_QDATA);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2016-05-14 03:36:58 +08:00
|
|
|
static inline s64 get_pages(struct f2fs_sb_info *sbi, int count_type)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2016-10-21 10:09:57 +08:00
|
|
|
return atomic_read(&sbi->nr_pages[count_type]);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2016-12-03 07:11:32 +08:00
|
|
|
static inline int get_dirty_pages(struct inode *inode)
|
2014-03-18 11:33:06 +08:00
|
|
|
{
|
2016-12-03 07:11:32 +08:00
|
|
|
return atomic_read(&F2FS_I(inode)->dirty_pages);
|
2014-03-18 11:33:06 +08:00
|
|
|
}
|
|
|
|
|
2013-02-02 22:52:59 +08:00
|
|
|
static inline int get_blocktype_secs(struct f2fs_sb_info *sbi, int block_type)
|
|
|
|
{
|
2015-12-01 11:56:52 +08:00
|
|
|
unsigned int pages_per_sec = sbi->segs_per_sec * sbi->blocks_per_seg;
|
2016-05-14 03:36:58 +08:00
|
|
|
unsigned int segs = (get_pages(sbi, block_type) + pages_per_sec - 1) >>
|
|
|
|
sbi->log_blocks_per_seg;
|
|
|
|
|
|
|
|
return segs / sbi->segs_per_sec;
|
2013-02-02 22:52:59 +08:00
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
static inline block_t valid_user_blocks(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2014-02-24 12:00:13 +08:00
|
|
|
return sbi->total_valid_block_count;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2016-08-18 21:01:18 +08:00
|
|
|
static inline block_t discard_blocks(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return sbi->discard_blks;
|
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
static inline unsigned long __bitmap_size(struct f2fs_sb_info *sbi, int flag)
|
|
|
|
{
|
|
|
|
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
|
|
|
|
|
|
|
|
/* return NAT or SIT bitmap */
|
|
|
|
if (flag == NAT_BITMAP)
|
|
|
|
return le32_to_cpu(ckpt->nat_ver_bitmap_bytesize);
|
|
|
|
else if (flag == SIT_BITMAP)
|
|
|
|
return le32_to_cpu(ckpt->sit_ver_bitmap_bytesize);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-02-26 07:57:20 +08:00
|
|
|
static inline block_t __cp_payload(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return le32_to_cpu(F2FS_RAW_SUPER(sbi)->cp_payload);
|
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag)
|
|
|
|
{
|
|
|
|
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
|
2021-02-25 03:03:13 +08:00
|
|
|
void *tmp_ptr = &ckpt->sit_nat_version_bitmap;
|
2014-05-12 11:27:43 +08:00
|
|
|
int offset;
|
|
|
|
|
2018-01-25 19:40:08 +08:00
|
|
|
if (is_set_ckpt_flags(sbi, CP_LARGE_NAT_BITMAP_FLAG)) {
|
|
|
|
offset = (flag == SIT_BITMAP) ?
|
|
|
|
le32_to_cpu(ckpt->nat_ver_bitmap_bytesize) : 0;
|
2019-04-22 17:33:53 +08:00
|
|
|
/*
|
|
|
|
* if large_nat_bitmap feature is enabled, leave checksum
|
|
|
|
* protection for all nat/sit bitmaps.
|
|
|
|
*/
|
2021-02-25 03:03:13 +08:00
|
|
|
return tmp_ptr + offset + sizeof(__le32);
|
2018-01-25 19:40:08 +08:00
|
|
|
}
|
|
|
|
|
2015-02-26 07:57:20 +08:00
|
|
|
if (__cp_payload(sbi) > 0) {
|
2014-05-12 11:27:43 +08:00
|
|
|
if (flag == NAT_BITMAP)
|
|
|
|
return &ckpt->sit_nat_version_bitmap;
|
|
|
|
else
|
2014-07-31 08:25:54 +08:00
|
|
|
return (unsigned char *)ckpt + F2FS_BLKSIZE;
|
2014-05-12 11:27:43 +08:00
|
|
|
} else {
|
|
|
|
offset = (flag == NAT_BITMAP) ?
|
2012-11-28 15:12:41 +08:00
|
|
|
le32_to_cpu(ckpt->sit_ver_bitmap_bytesize) : 0;
|
2021-02-25 03:03:13 +08:00
|
|
|
return tmp_ptr + offset;
|
2014-05-12 11:27:43 +08:00
|
|
|
}
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline block_t __start_cp_addr(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2016-11-25 04:45:15 +08:00
|
|
|
block_t start_addr = le32_to_cpu(F2FS_RAW_SUPER(sbi)->cp_blkaddr);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2016-11-25 04:45:15 +08:00
|
|
|
if (sbi->cur_cp_pack == 2)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
start_addr += sbi->blocks_per_seg;
|
2016-11-25 04:45:15 +08:00
|
|
|
return start_addr;
|
|
|
|
}
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2016-11-25 04:45:15 +08:00
|
|
|
static inline block_t __start_cp_next_addr(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
block_t start_addr = le32_to_cpu(F2FS_RAW_SUPER(sbi)->cp_blkaddr);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2016-11-25 04:45:15 +08:00
|
|
|
if (sbi->cur_cp_pack == 1)
|
|
|
|
start_addr += sbi->blocks_per_seg;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
return start_addr;
|
|
|
|
}
|
|
|
|
|
2016-11-25 04:45:15 +08:00
|
|
|
static inline void __set_cp_next_pack(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
sbi->cur_cp_pack = (sbi->cur_cp_pack == 1) ? 2 : 1;
|
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
static inline block_t __start_sum_addr(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return le32_to_cpu(F2FS_CKPT(sbi)->cp_pack_start_sum);
|
|
|
|
}
|
|
|
|
|
2017-07-09 00:13:07 +08:00
|
|
|
static inline int inc_valid_node_count(struct f2fs_sb_info *sbi,
|
f2fs: don't count inode block in in-memory inode.i_blocks
Previously, we count all inode consumed blocks including inode block,
xattr block, index block, data block into i_blocks, for other generic
filesystems, they won't count inode block into i_blocks, so for
userspace applications or quota system, they may detect incorrect block
count according to i_blocks value in inode.
This patch changes to count all blocks into inode.i_blocks excluding
inode block, for on-disk i_blocks, we keep counting inode block for
backward compatibility.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-06 01:11:31 +08:00
|
|
|
struct inode *inode, bool is_inode)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
|
|
|
block_t valid_block_count;
|
2019-05-30 08:49:05 +08:00
|
|
|
unsigned int valid_node_count, user_block_count;
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 20:05:00 +08:00
|
|
|
int err;
|
2017-07-09 00:13:07 +08:00
|
|
|
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 20:05:00 +08:00
|
|
|
if (is_inode) {
|
|
|
|
if (inode) {
|
|
|
|
err = dquot_alloc_inode(inode);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
err = dquot_reserve_block(inode, 1);
|
|
|
|
if (err)
|
|
|
|
return err;
|
2017-07-09 00:13:07 +08:00
|
|
|
}
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2017-11-13 17:32:40 +08:00
|
|
|
if (time_to_inject(sbi, FAULT_BLOCK)) {
|
2019-11-01 17:53:23 +08:00
|
|
|
f2fs_show_injection_info(sbi, FAULT_BLOCK);
|
2017-11-13 17:32:40 +08:00
|
|
|
goto enospc;
|
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
spin_lock(&sbi->stat_lock);
|
|
|
|
|
2017-12-28 07:05:52 +08:00
|
|
|
valid_block_count = sbi->total_valid_block_count +
|
|
|
|
sbi->current_reserved_blocks + 1;
|
|
|
|
|
2018-04-21 14:44:59 +08:00
|
|
|
if (!__allow_reserved_blocks(sbi, inode, false))
|
2018-03-08 14:22:56 +08:00
|
|
|
valid_block_count += F2FS_OPTION(sbi).root_reserved_blocks;
|
f2fs: fix to reserve space for IO align feature
https://bugzilla.kernel.org/show_bug.cgi?id=204137
With below script, we will hit panic during new segment allocation:
DISK=bingo.img
MOUNT_DIR=/mnt/f2fs
dd if=/dev/zero of=$DISK bs=1M count=105
mkfs.f2fe -a 1 -o 19 -t 1 -z 1 -f -q $DISK
mount -t f2fs $DISK $MOUNT_DIR -o "noinline_dentry,flush_merge,noextent_cache,mode=lfs,io_bits=7,fsync_mode=strict"
for (( i = 0; i < 4096; i++ )); do
name=`head /dev/urandom | tr -dc A-Za-z0-9 | head -c 10`
mkdir $MOUNT_DIR/$name
done
umount $MOUNT_DIR
rm $DISK
--- Core dump ---
Call Trace:
allocate_segment_by_default+0x9d/0x100 [f2fs]
f2fs_allocate_data_block+0x3c0/0x5c0 [f2fs]
do_write_page+0x62/0x110 [f2fs]
f2fs_outplace_write_data+0x43/0xc0 [f2fs]
f2fs_do_write_data_page+0x386/0x560 [f2fs]
__write_data_page+0x706/0x850 [f2fs]
f2fs_write_cache_pages+0x267/0x6a0 [f2fs]
f2fs_write_data_pages+0x19c/0x2e0 [f2fs]
do_writepages+0x1c/0x70
__filemap_fdatawrite_range+0xaa/0xe0
filemap_fdatawrite+0x1f/0x30
f2fs_sync_dirty_inodes+0x74/0x1f0 [f2fs]
block_operations+0xdc/0x350 [f2fs]
f2fs_write_checkpoint+0x104/0x1150 [f2fs]
f2fs_sync_fs+0xa2/0x120 [f2fs]
f2fs_balance_fs_bg+0x33c/0x390 [f2fs]
f2fs_write_node_pages+0x4c/0x1f0 [f2fs]
do_writepages+0x1c/0x70
__writeback_single_inode+0x45/0x320
writeback_sb_inodes+0x273/0x5c0
wb_writeback+0xff/0x2e0
wb_workfn+0xa1/0x370
process_one_work+0x138/0x350
worker_thread+0x4d/0x3d0
kthread+0x109/0x140
ret_from_fork+0x25/0x30
The root cause here is, with IO alignment feature enables, in worst
case, we need F2FS_IO_SIZE() free blocks space for single one 4k write
due to IO alignment feature will fill dummy pages to make IO being
aligned.
So we will easily run out of free segments during non-inline directory's
data writeback, even in process of foreground GC.
In order to fix this issue, I just propose to reserve additional free
space for IO alignment feature to handle worst case of free space usage
ratio during FGGC.
Fixes: 0a595ebaaa6b ("f2fs: support IO alignment for DATA and NODE writes")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-12-11 21:27:36 +08:00
|
|
|
|
|
|
|
if (F2FS_IO_ALIGNED(sbi))
|
|
|
|
valid_block_count += sbi->blocks_per_seg *
|
|
|
|
SM_I(sbi)->additional_reserved_segments;
|
|
|
|
|
2019-05-30 08:49:05 +08:00
|
|
|
user_block_count = sbi->user_block_count;
|
2018-08-21 10:21:43 +08:00
|
|
|
if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
|
2019-05-30 08:49:05 +08:00
|
|
|
user_block_count -= sbi->unusable_block_count;
|
2017-12-28 07:05:52 +08:00
|
|
|
|
2019-05-30 08:49:05 +08:00
|
|
|
if (unlikely(valid_block_count > user_block_count)) {
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
spin_unlock(&sbi->stat_lock);
|
2017-07-09 00:13:07 +08:00
|
|
|
goto enospc;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2013-11-19 18:03:38 +08:00
|
|
|
valid_node_count = sbi->total_valid_node_count + 1;
|
2013-12-05 17:15:22 +08:00
|
|
|
if (unlikely(valid_node_count > sbi->total_node_count)) {
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
spin_unlock(&sbi->stat_lock);
|
2017-07-09 00:13:07 +08:00
|
|
|
goto enospc;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2013-11-19 18:03:38 +08:00
|
|
|
sbi->total_valid_node_count++;
|
|
|
|
sbi->total_valid_block_count++;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
spin_unlock(&sbi->stat_lock);
|
|
|
|
|
f2fs: don't count inode block in in-memory inode.i_blocks
Previously, we count all inode consumed blocks including inode block,
xattr block, index block, data block into i_blocks, for other generic
filesystems, they won't count inode block into i_blocks, so for
userspace applications or quota system, they may detect incorrect block
count according to i_blocks value in inode.
This patch changes to count all blocks into inode.i_blocks excluding
inode block, for on-disk i_blocks, we keep counting inode block for
backward compatibility.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-06 01:11:31 +08:00
|
|
|
if (inode) {
|
|
|
|
if (is_inode)
|
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
|
|
|
else
|
2017-07-09 00:13:07 +08:00
|
|
|
f2fs_i_blocks_write(inode, 1, true, true);
|
f2fs: don't count inode block in in-memory inode.i_blocks
Previously, we count all inode consumed blocks including inode block,
xattr block, index block, data block into i_blocks, for other generic
filesystems, they won't count inode block into i_blocks, so for
userspace applications or quota system, they may detect incorrect block
count according to i_blocks value in inode.
This patch changes to count all blocks into inode.i_blocks excluding
inode block, for on-disk i_blocks, we keep counting inode block for
backward compatibility.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-06 01:11:31 +08:00
|
|
|
}
|
2013-11-19 18:03:38 +08:00
|
|
|
|
2016-05-17 02:06:50 +08:00
|
|
|
percpu_counter_inc(&sbi->alloc_valid_block_count);
|
2017-07-09 00:13:07 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
enospc:
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 20:05:00 +08:00
|
|
|
if (is_inode) {
|
|
|
|
if (inode)
|
|
|
|
dquot_free_inode(inode);
|
|
|
|
} else {
|
2017-07-09 00:13:07 +08:00
|
|
|
dquot_release_reservation_block(inode, 1);
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 20:05:00 +08:00
|
|
|
}
|
2017-07-09 00:13:07 +08:00
|
|
|
return -ENOSPC;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void dec_valid_node_count(struct f2fs_sb_info *sbi,
|
f2fs: don't count inode block in in-memory inode.i_blocks
Previously, we count all inode consumed blocks including inode block,
xattr block, index block, data block into i_blocks, for other generic
filesystems, they won't count inode block into i_blocks, so for
userspace applications or quota system, they may detect incorrect block
count according to i_blocks value in inode.
This patch changes to count all blocks into inode.i_blocks excluding
inode block, for on-disk i_blocks, we keep counting inode block for
backward compatibility.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-06 01:11:31 +08:00
|
|
|
struct inode *inode, bool is_inode)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
|
|
|
spin_lock(&sbi->stat_lock);
|
|
|
|
|
2014-09-03 06:52:58 +08:00
|
|
|
f2fs_bug_on(sbi, !sbi->total_valid_block_count);
|
|
|
|
f2fs_bug_on(sbi, !sbi->total_valid_node_count);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2013-11-19 18:03:38 +08:00
|
|
|
sbi->total_valid_node_count--;
|
|
|
|
sbi->total_valid_block_count--;
|
2017-10-27 20:45:05 +08:00
|
|
|
if (sbi->reserved_blocks &&
|
|
|
|
sbi->current_reserved_blocks < sbi->reserved_blocks)
|
|
|
|
sbi->current_reserved_blocks++;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
spin_unlock(&sbi->stat_lock);
|
2017-07-09 00:13:07 +08:00
|
|
|
|
2019-04-15 15:28:31 +08:00
|
|
|
if (is_inode) {
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 20:05:00 +08:00
|
|
|
dquot_free_inode(inode);
|
2019-04-15 15:28:31 +08:00
|
|
|
} else {
|
|
|
|
if (unlikely(inode->i_blocks == 0)) {
|
2020-02-24 19:20:19 +08:00
|
|
|
f2fs_warn(sbi, "dec_valid_node_count: inconsistent i_blocks, ino:%lu, iblocks:%llu",
|
2019-06-18 17:48:42 +08:00
|
|
|
inode->i_ino,
|
|
|
|
(unsigned long long)inode->i_blocks);
|
2019-04-15 15:28:31 +08:00
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK);
|
|
|
|
return;
|
|
|
|
}
|
2017-07-09 00:13:07 +08:00
|
|
|
f2fs_i_blocks_write(inode, 1, false, true);
|
2019-04-15 15:28:31 +08:00
|
|
|
}
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline unsigned int valid_node_count(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2014-02-24 12:00:13 +08:00
|
|
|
return sbi->total_valid_node_count;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void inc_valid_inode_count(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2016-05-17 02:42:32 +08:00
|
|
|
percpu_counter_inc(&sbi->total_valid_inode_count);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2013-11-26 15:36:20 +08:00
|
|
|
static inline void dec_valid_inode_count(struct f2fs_sb_info *sbi)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2016-05-17 02:42:32 +08:00
|
|
|
percpu_counter_dec(&sbi->total_valid_inode_count);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2016-05-17 02:42:32 +08:00
|
|
|
static inline s64 valid_inode_count(struct f2fs_sb_info *sbi)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2016-05-17 02:42:32 +08:00
|
|
|
return percpu_counter_sum_positive(&sbi->total_valid_inode_count);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2015-10-10 06:11:38 +08:00
|
|
|
static inline struct page *f2fs_grab_cache_page(struct address_space *mapping,
|
|
|
|
pgoff_t index, bool for_write)
|
|
|
|
{
|
2018-07-27 18:15:14 +08:00
|
|
|
struct page *page;
|
2017-01-31 02:55:18 +08:00
|
|
|
|
2018-08-14 05:38:06 +08:00
|
|
|
if (IS_ENABLED(CONFIG_F2FS_FAULT_INJECTION)) {
|
|
|
|
if (!for_write)
|
|
|
|
page = find_get_page_flags(mapping, index,
|
|
|
|
FGP_LOCK | FGP_ACCESSED);
|
|
|
|
else
|
|
|
|
page = find_lock_page(mapping, index);
|
|
|
|
if (page)
|
|
|
|
return page;
|
2016-04-30 07:17:09 +08:00
|
|
|
|
2018-08-14 05:38:06 +08:00
|
|
|
if (time_to_inject(F2FS_M_SB(mapping), FAULT_PAGE_ALLOC)) {
|
2019-11-01 17:53:23 +08:00
|
|
|
f2fs_show_injection_info(F2FS_M_SB(mapping),
|
|
|
|
FAULT_PAGE_ALLOC);
|
2018-08-14 05:38:06 +08:00
|
|
|
return NULL;
|
|
|
|
}
|
2017-02-25 11:08:28 +08:00
|
|
|
}
|
2018-08-14 05:38:06 +08:00
|
|
|
|
2015-10-10 06:11:38 +08:00
|
|
|
if (!for_write)
|
|
|
|
return grab_cache_page(mapping, index);
|
|
|
|
return grab_cache_page_write_begin(mapping, index, AOP_FLAG_NOFS);
|
|
|
|
}
|
|
|
|
|
2017-10-28 16:52:30 +08:00
|
|
|
static inline struct page *f2fs_pagecache_get_page(
|
|
|
|
struct address_space *mapping, pgoff_t index,
|
|
|
|
int fgp_flags, gfp_t gfp_mask)
|
|
|
|
{
|
|
|
|
if (time_to_inject(F2FS_M_SB(mapping), FAULT_PAGE_GET)) {
|
2019-11-01 17:53:23 +08:00
|
|
|
f2fs_show_injection_info(F2FS_M_SB(mapping), FAULT_PAGE_GET);
|
2017-10-28 16:52:30 +08:00
|
|
|
return NULL;
|
|
|
|
}
|
2018-08-14 05:38:06 +08:00
|
|
|
|
2017-10-28 16:52:30 +08:00
|
|
|
return pagecache_get_page(mapping, index, fgp_flags, gfp_mask);
|
|
|
|
}
|
|
|
|
|
2015-10-08 03:28:41 +08:00
|
|
|
static inline void f2fs_copy_page(struct page *src, struct page *dst)
|
|
|
|
{
|
|
|
|
char *src_kaddr = kmap(src);
|
|
|
|
char *dst_kaddr = kmap(dst);
|
|
|
|
|
|
|
|
memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
|
|
|
|
kunmap(dst);
|
|
|
|
kunmap(src);
|
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
static inline void f2fs_put_page(struct page *page, int unlock)
|
|
|
|
{
|
2013-11-28 11:55:13 +08:00
|
|
|
if (!page)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
if (unlock) {
|
2014-09-03 06:52:58 +08:00
|
|
|
f2fs_bug_on(F2FS_P_SB(page), !PageLocked(page));
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
unlock_page(page);
|
|
|
|
}
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
put_page(page);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void f2fs_put_dnode(struct dnode_of_data *dn)
|
|
|
|
{
|
|
|
|
if (dn->node_page)
|
|
|
|
f2fs_put_page(dn->node_page, 1);
|
|
|
|
if (dn->inode_page && dn->node_page != dn->inode_page)
|
|
|
|
f2fs_put_page(dn->inode_page, 0);
|
|
|
|
dn->node_page = NULL;
|
|
|
|
dn->inode_page = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct kmem_cache *f2fs_kmem_cache_create(const char *name,
|
2014-03-07 18:43:28 +08:00
|
|
|
size_t size)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2014-03-07 18:43:28 +08:00
|
|
|
return kmem_cache_create(name, size, 0, SLAB_RECLAIM_ACCOUNT, NULL);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2021-08-09 08:24:48 +08:00
|
|
|
static inline void *f2fs_kmem_cache_alloc_nofail(struct kmem_cache *cachep,
|
2013-10-22 14:52:26 +08:00
|
|
|
gfp_t flags)
|
|
|
|
{
|
|
|
|
void *entry;
|
|
|
|
|
2015-08-20 23:51:56 +08:00
|
|
|
entry = kmem_cache_alloc(cachep, flags);
|
|
|
|
if (!entry)
|
|
|
|
entry = kmem_cache_alloc(cachep, flags | __GFP_NOFAIL);
|
2013-10-22 14:52:26 +08:00
|
|
|
return entry;
|
|
|
|
}
|
|
|
|
|
2021-08-09 08:24:48 +08:00
|
|
|
static inline void *f2fs_kmem_cache_alloc(struct kmem_cache *cachep,
|
|
|
|
gfp_t flags, bool nofail, struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
if (nofail)
|
|
|
|
return f2fs_kmem_cache_alloc_nofail(cachep, flags);
|
|
|
|
|
|
|
|
if (time_to_inject(sbi, FAULT_SLAB_ALLOC)) {
|
|
|
|
f2fs_show_injection_info(sbi, FAULT_SLAB_ALLOC);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return kmem_cache_alloc(cachep, flags);
|
|
|
|
}
|
|
|
|
|
2020-11-25 10:57:36 +08:00
|
|
|
static inline bool is_inflight_io(struct f2fs_sb_info *sbi, int type)
|
2018-10-17 01:20:53 +08:00
|
|
|
{
|
|
|
|
if (get_pages(sbi, F2FS_RD_DATA) || get_pages(sbi, F2FS_RD_NODE) ||
|
|
|
|
get_pages(sbi, F2FS_RD_META) || get_pages(sbi, F2FS_WB_DATA) ||
|
2018-11-12 00:55:44 +08:00
|
|
|
get_pages(sbi, F2FS_WB_CP_DATA) ||
|
|
|
|
get_pages(sbi, F2FS_DIO_READ) ||
|
2019-01-26 04:05:25 +08:00
|
|
|
get_pages(sbi, F2FS_DIO_WRITE))
|
2020-11-25 10:57:36 +08:00
|
|
|
return true;
|
2019-01-26 04:05:25 +08:00
|
|
|
|
2019-06-06 17:38:13 +08:00
|
|
|
if (type != DISCARD_TIME && SM_I(sbi) && SM_I(sbi)->dcc_info &&
|
2019-01-26 04:05:25 +08:00
|
|
|
atomic_read(&SM_I(sbi)->dcc_info->queued_discard))
|
2020-11-25 10:57:36 +08:00
|
|
|
return true;
|
2019-01-26 04:05:25 +08:00
|
|
|
|
|
|
|
if (SM_I(sbi) && SM_I(sbi)->fcc_info &&
|
|
|
|
atomic_read(&SM_I(sbi)->fcc_info->queued_flush))
|
2020-11-25 10:57:36 +08:00
|
|
|
return true;
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool is_idle(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
|
|
|
if (sbi->gc_mode == GC_URGENT_HIGH)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
if (is_inflight_io(sbi, type))
|
2019-01-26 04:05:25 +08:00
|
|
|
return false;
|
|
|
|
|
2020-07-02 12:14:14 +08:00
|
|
|
if (sbi->gc_mode == GC_URGENT_LOW &&
|
|
|
|
(type == DISCARD_TIME || type == GC_TIME))
|
|
|
|
return true;
|
|
|
|
|
2018-10-17 01:20:53 +08:00
|
|
|
return f2fs_time_over(sbi, type);
|
|
|
|
}
|
|
|
|
|
2014-12-06 02:39:49 +08:00
|
|
|
static inline void f2fs_radix_tree_insert(struct radix_tree_root *root,
|
|
|
|
unsigned long index, void *item)
|
|
|
|
{
|
|
|
|
while (radix_tree_insert(root, index, item))
|
|
|
|
cond_resched();
|
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
#define RAW_IS_INODE(p) ((p)->footer.nid == (p)->footer.ino)
|
|
|
|
|
|
|
|
static inline bool IS_INODE(struct page *page)
|
|
|
|
{
|
2013-07-15 17:57:38 +08:00
|
|
|
struct f2fs_node *p = F2FS_NODE(page);
|
2017-01-31 02:55:18 +08:00
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
return RAW_IS_INODE(p);
|
|
|
|
}
|
|
|
|
|
2017-07-19 00:19:06 +08:00
|
|
|
static inline int offset_in_addr(struct f2fs_inode *i)
|
|
|
|
{
|
|
|
|
return (i->i_inline & F2FS_EXTRA_ATTR) ?
|
|
|
|
(le16_to_cpu(i->i_extra_isize) / sizeof(__le32)) : 0;
|
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
static inline __le32 *blkaddr_in_node(struct f2fs_node *node)
|
|
|
|
{
|
|
|
|
return RAW_IS_INODE(node) ? node->i.i_addr : node->dn.addr;
|
|
|
|
}
|
|
|
|
|
2017-07-19 00:19:06 +08:00
|
|
|
static inline int f2fs_has_extra_attr(struct inode *inode);
|
2020-02-14 17:44:10 +08:00
|
|
|
static inline block_t data_blkaddr(struct inode *inode,
|
2017-07-19 00:19:06 +08:00
|
|
|
struct page *node_page, unsigned int offset)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
|
|
|
struct f2fs_node *raw_node;
|
|
|
|
__le32 *addr_array;
|
2017-07-19 00:19:06 +08:00
|
|
|
int base = 0;
|
|
|
|
bool is_inode = IS_INODE(node_page);
|
2017-01-31 02:55:18 +08:00
|
|
|
|
2013-07-15 17:57:38 +08:00
|
|
|
raw_node = F2FS_NODE(node_page);
|
2017-07-19 00:19:06 +08:00
|
|
|
|
2017-11-28 20:17:41 +08:00
|
|
|
if (is_inode) {
|
|
|
|
if (!inode)
|
2020-02-27 19:30:05 +08:00
|
|
|
/* from GC path only */
|
2017-07-19 00:19:06 +08:00
|
|
|
base = offset_in_addr(&raw_node->i);
|
2017-11-28 20:17:41 +08:00
|
|
|
else if (f2fs_has_extra_attr(inode))
|
|
|
|
base = get_extra_isize(inode);
|
2017-07-19 00:19:06 +08:00
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
addr_array = blkaddr_in_node(raw_node);
|
2017-07-19 00:19:06 +08:00
|
|
|
return le32_to_cpu(addr_array[base + offset]);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2020-02-14 17:44:10 +08:00
|
|
|
static inline block_t f2fs_data_blkaddr(struct dnode_of_data *dn)
|
|
|
|
{
|
|
|
|
return data_blkaddr(dn->inode, dn->node_page, dn->ofs_in_node);
|
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
static inline int f2fs_test_bit(unsigned int nr, char *addr)
|
|
|
|
{
|
|
|
|
int mask;
|
|
|
|
|
|
|
|
addr += (nr >> 3);
|
|
|
|
mask = 1 << (7 - (nr & 0x07));
|
|
|
|
return mask & *addr;
|
|
|
|
}
|
|
|
|
|
2015-05-01 13:37:50 +08:00
|
|
|
static inline void f2fs_set_bit(unsigned int nr, char *addr)
|
|
|
|
{
|
|
|
|
int mask;
|
|
|
|
|
|
|
|
addr += (nr >> 3);
|
|
|
|
mask = 1 << (7 - (nr & 0x07));
|
|
|
|
*addr |= mask;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void f2fs_clear_bit(unsigned int nr, char *addr)
|
|
|
|
{
|
|
|
|
int mask;
|
|
|
|
|
|
|
|
addr += (nr >> 3);
|
|
|
|
mask = 1 << (7 - (nr & 0x07));
|
|
|
|
*addr &= ~mask;
|
|
|
|
}
|
|
|
|
|
2014-10-20 17:45:51 +08:00
|
|
|
static inline int f2fs_test_and_set_bit(unsigned int nr, char *addr)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
|
|
|
int mask;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
addr += (nr >> 3);
|
|
|
|
mask = 1 << (7 - (nr & 0x07));
|
|
|
|
ret = mask & *addr;
|
|
|
|
*addr |= mask;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-10-20 17:45:51 +08:00
|
|
|
static inline int f2fs_test_and_clear_bit(unsigned int nr, char *addr)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
|
|
|
int mask;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
addr += (nr >> 3);
|
|
|
|
mask = 1 << (7 - (nr & 0x07));
|
|
|
|
ret = mask & *addr;
|
|
|
|
*addr &= ~mask;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-10-20 17:45:50 +08:00
|
|
|
static inline void f2fs_change_bit(unsigned int nr, char *addr)
|
|
|
|
{
|
|
|
|
int mask;
|
|
|
|
|
|
|
|
addr += (nr >> 3);
|
|
|
|
mask = 1 << (7 - (nr & 0x07));
|
|
|
|
*addr ^= mask;
|
|
|
|
}
|
|
|
|
|
2018-04-03 15:08:17 +08:00
|
|
|
/*
|
2019-06-05 13:59:04 +08:00
|
|
|
* On-disk inode flags (f2fs_inode::i_flags)
|
2018-04-03 15:08:17 +08:00
|
|
|
*/
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
#define F2FS_COMPR_FL 0x00000004 /* Compress file */
|
2018-04-03 15:08:17 +08:00
|
|
|
#define F2FS_SYNC_FL 0x00000008 /* Synchronous updates */
|
|
|
|
#define F2FS_IMMUTABLE_FL 0x00000010 /* Immutable file */
|
|
|
|
#define F2FS_APPEND_FL 0x00000020 /* writes to file may only append */
|
|
|
|
#define F2FS_NODUMP_FL 0x00000040 /* do not dump file */
|
|
|
|
#define F2FS_NOATIME_FL 0x00000080 /* do not update atime */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
#define F2FS_NOCOMP_FL 0x00000400 /* Don't compress */
|
2018-04-03 15:08:17 +08:00
|
|
|
#define F2FS_INDEX_FL 0x00001000 /* hash-indexed directory */
|
|
|
|
#define F2FS_DIRSYNC_FL 0x00010000 /* dirsync behaviour (directories only) */
|
|
|
|
#define F2FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
|
f2fs: Support case-insensitive file name lookups
Modeled after commit b886ee3e778e ("ext4: Support case-insensitive file
name lookups")
"""
This patch implements the actual support for case-insensitive file name
lookups in f2fs, based on the feature bit and the encoding stored in the
superblock.
A filesystem that has the casefold feature set is able to configure
directories with the +F (F2FS_CASEFOLD_FL) attribute, enabling lookups
to succeed in that directory in a case-insensitive fashion, i.e: match
a directory entry even if the name used by userspace is not a byte per
byte match with the disk name, but is an equivalent case-insensitive
version of the Unicode string. This operation is called a
case-insensitive file name lookup.
The feature is configured as an inode attribute applied to directories
and inherited by its children. This attribute can only be enabled on
empty directories for filesystems that support the encoding feature,
thus preventing collision of file names that only differ by case.
* dcache handling:
For a +F directory, F2Fs only stores the first equivalent name dentry
used in the dcache. This is done to prevent unintentional duplication of
dentries in the dcache, while also allowing the VFS code to quickly find
the right entry in the cache despite which equivalent string was used in
a previous lookup, without having to resort to ->lookup().
d_hash() of casefolded directories is implemented as the hash of the
casefolded string, such that we always have a well-known bucket for all
the equivalencies of the same string. d_compare() uses the
utf8_strncasecmp() infrastructure, which handles the comparison of
equivalent, same case, names as well.
For now, negative lookups are not inserted in the dcache, since they
would need to be invalidated anyway, because we can't trust missing file
dentries. This is bad for performance but requires some leveraging of
the vfs layer to fix. We can live without that for now, and so does
everyone else.
* on-disk data:
Despite using a specific version of the name as the internal
representation within the dcache, the name stored and fetched from the
disk is a byte-per-byte match with what the user requested, making this
implementation 'name-preserving'. i.e. no actual information is lost
when writing to storage.
DX is supported by modifying the hashes used in +F directories to make
them case/encoding-aware. The new disk hashes are calculated as the
hash of the full casefolded string, instead of the string directly.
This allows us to efficiently search for file names in the htree without
requiring the user to provide an exact name.
* Dealing with invalid sequences:
By default, when a invalid UTF-8 sequence is identified, ext4 will treat
it as an opaque byte sequence, ignoring the encoding and reverting to
the old behavior for that unique file. This means that case-insensitive
file name lookup will not work only for that file. An optional bit can
be set in the superblock telling the filesystem code and userspace tools
to enforce the encoding. When that optional bit is set, any attempt to
create a file name using an invalid UTF-8 sequence will fail and return
an error to userspace.
* Normalization algorithm:
The UTF-8 algorithms used to compare strings in f2fs is implemented
in fs/unicode, and is based on a previous version developed by
SGI. It implements the Canonical decomposition (NFD) algorithm
described by the Unicode specification 12.1, or higher, combined with
the elimination of ignorable code points (NFDi) and full
case-folding (CF) as documented in fs/unicode/utf8_norm.c.
NFD seems to be the best normalization method for F2FS because:
- It has a lower cost than NFC/NFKC (which requires
decomposing to NFD as an intermediary step)
- It doesn't eliminate important semantic meaning like
compatibility decompositions.
Although:
- This implementation is not completely linguistic accurate, because
different languages have conflicting rules, which would require the
specialization of the filesystem to a given locale, which brings all
sorts of problems for removable media and for users who use more than
one language.
"""
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-24 07:05:29 +08:00
|
|
|
#define F2FS_CASEFOLD_FL 0x40000000 /* Casefolded file */
|
2018-04-03 15:08:17 +08:00
|
|
|
|
|
|
|
/* Flags that should be inherited by new inodes from their parent. */
|
2019-06-05 13:59:04 +08:00
|
|
|
#define F2FS_FL_INHERITED (F2FS_SYNC_FL | F2FS_NODUMP_FL | F2FS_NOATIME_FL | \
|
f2fs: Support case-insensitive file name lookups
Modeled after commit b886ee3e778e ("ext4: Support case-insensitive file
name lookups")
"""
This patch implements the actual support for case-insensitive file name
lookups in f2fs, based on the feature bit and the encoding stored in the
superblock.
A filesystem that has the casefold feature set is able to configure
directories with the +F (F2FS_CASEFOLD_FL) attribute, enabling lookups
to succeed in that directory in a case-insensitive fashion, i.e: match
a directory entry even if the name used by userspace is not a byte per
byte match with the disk name, but is an equivalent case-insensitive
version of the Unicode string. This operation is called a
case-insensitive file name lookup.
The feature is configured as an inode attribute applied to directories
and inherited by its children. This attribute can only be enabled on
empty directories for filesystems that support the encoding feature,
thus preventing collision of file names that only differ by case.
* dcache handling:
For a +F directory, F2Fs only stores the first equivalent name dentry
used in the dcache. This is done to prevent unintentional duplication of
dentries in the dcache, while also allowing the VFS code to quickly find
the right entry in the cache despite which equivalent string was used in
a previous lookup, without having to resort to ->lookup().
d_hash() of casefolded directories is implemented as the hash of the
casefolded string, such that we always have a well-known bucket for all
the equivalencies of the same string. d_compare() uses the
utf8_strncasecmp() infrastructure, which handles the comparison of
equivalent, same case, names as well.
For now, negative lookups are not inserted in the dcache, since they
would need to be invalidated anyway, because we can't trust missing file
dentries. This is bad for performance but requires some leveraging of
the vfs layer to fix. We can live without that for now, and so does
everyone else.
* on-disk data:
Despite using a specific version of the name as the internal
representation within the dcache, the name stored and fetched from the
disk is a byte-per-byte match with what the user requested, making this
implementation 'name-preserving'. i.e. no actual information is lost
when writing to storage.
DX is supported by modifying the hashes used in +F directories to make
them case/encoding-aware. The new disk hashes are calculated as the
hash of the full casefolded string, instead of the string directly.
This allows us to efficiently search for file names in the htree without
requiring the user to provide an exact name.
* Dealing with invalid sequences:
By default, when a invalid UTF-8 sequence is identified, ext4 will treat
it as an opaque byte sequence, ignoring the encoding and reverting to
the old behavior for that unique file. This means that case-insensitive
file name lookup will not work only for that file. An optional bit can
be set in the superblock telling the filesystem code and userspace tools
to enforce the encoding. When that optional bit is set, any attempt to
create a file name using an invalid UTF-8 sequence will fail and return
an error to userspace.
* Normalization algorithm:
The UTF-8 algorithms used to compare strings in f2fs is implemented
in fs/unicode, and is based on a previous version developed by
SGI. It implements the Canonical decomposition (NFD) algorithm
described by the Unicode specification 12.1, or higher, combined with
the elimination of ignorable code points (NFDi) and full
case-folding (CF) as documented in fs/unicode/utf8_norm.c.
NFD seems to be the best normalization method for F2FS because:
- It has a lower cost than NFC/NFKC (which requires
decomposing to NFD as an intermediary step)
- It doesn't eliminate important semantic meaning like
compatibility decompositions.
Although:
- This implementation is not completely linguistic accurate, because
different languages have conflicting rules, which would require the
specialization of the filesystem to a given locale, which brings all
sorts of problems for removable media and for users who use more than
one language.
"""
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-24 07:05:29 +08:00
|
|
|
F2FS_DIRSYNC_FL | F2FS_PROJINHERIT_FL | \
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
F2FS_CASEFOLD_FL | F2FS_COMPR_FL | F2FS_NOCOMP_FL)
|
2018-04-03 15:08:17 +08:00
|
|
|
|
|
|
|
/* Flags that are appropriate for regular files (all but dir-specific ones). */
|
f2fs: Support case-insensitive file name lookups
Modeled after commit b886ee3e778e ("ext4: Support case-insensitive file
name lookups")
"""
This patch implements the actual support for case-insensitive file name
lookups in f2fs, based on the feature bit and the encoding stored in the
superblock.
A filesystem that has the casefold feature set is able to configure
directories with the +F (F2FS_CASEFOLD_FL) attribute, enabling lookups
to succeed in that directory in a case-insensitive fashion, i.e: match
a directory entry even if the name used by userspace is not a byte per
byte match with the disk name, but is an equivalent case-insensitive
version of the Unicode string. This operation is called a
case-insensitive file name lookup.
The feature is configured as an inode attribute applied to directories
and inherited by its children. This attribute can only be enabled on
empty directories for filesystems that support the encoding feature,
thus preventing collision of file names that only differ by case.
* dcache handling:
For a +F directory, F2Fs only stores the first equivalent name dentry
used in the dcache. This is done to prevent unintentional duplication of
dentries in the dcache, while also allowing the VFS code to quickly find
the right entry in the cache despite which equivalent string was used in
a previous lookup, without having to resort to ->lookup().
d_hash() of casefolded directories is implemented as the hash of the
casefolded string, such that we always have a well-known bucket for all
the equivalencies of the same string. d_compare() uses the
utf8_strncasecmp() infrastructure, which handles the comparison of
equivalent, same case, names as well.
For now, negative lookups are not inserted in the dcache, since they
would need to be invalidated anyway, because we can't trust missing file
dentries. This is bad for performance but requires some leveraging of
the vfs layer to fix. We can live without that for now, and so does
everyone else.
* on-disk data:
Despite using a specific version of the name as the internal
representation within the dcache, the name stored and fetched from the
disk is a byte-per-byte match with what the user requested, making this
implementation 'name-preserving'. i.e. no actual information is lost
when writing to storage.
DX is supported by modifying the hashes used in +F directories to make
them case/encoding-aware. The new disk hashes are calculated as the
hash of the full casefolded string, instead of the string directly.
This allows us to efficiently search for file names in the htree without
requiring the user to provide an exact name.
* Dealing with invalid sequences:
By default, when a invalid UTF-8 sequence is identified, ext4 will treat
it as an opaque byte sequence, ignoring the encoding and reverting to
the old behavior for that unique file. This means that case-insensitive
file name lookup will not work only for that file. An optional bit can
be set in the superblock telling the filesystem code and userspace tools
to enforce the encoding. When that optional bit is set, any attempt to
create a file name using an invalid UTF-8 sequence will fail and return
an error to userspace.
* Normalization algorithm:
The UTF-8 algorithms used to compare strings in f2fs is implemented
in fs/unicode, and is based on a previous version developed by
SGI. It implements the Canonical decomposition (NFD) algorithm
described by the Unicode specification 12.1, or higher, combined with
the elimination of ignorable code points (NFDi) and full
case-folding (CF) as documented in fs/unicode/utf8_norm.c.
NFD seems to be the best normalization method for F2FS because:
- It has a lower cost than NFC/NFKC (which requires
decomposing to NFD as an intermediary step)
- It doesn't eliminate important semantic meaning like
compatibility decompositions.
Although:
- This implementation is not completely linguistic accurate, because
different languages have conflicting rules, which would require the
specialization of the filesystem to a given locale, which brings all
sorts of problems for removable media and for users who use more than
one language.
"""
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-24 07:05:29 +08:00
|
|
|
#define F2FS_REG_FLMASK (~(F2FS_DIRSYNC_FL | F2FS_PROJINHERIT_FL | \
|
|
|
|
F2FS_CASEFOLD_FL))
|
2018-04-03 15:08:17 +08:00
|
|
|
|
|
|
|
/* Flags that are appropriate for non-directories/regular files. */
|
|
|
|
#define F2FS_OTHER_FLMASK (F2FS_NODUMP_FL | F2FS_NOATIME_FL)
|
2017-07-26 00:01:41 +08:00
|
|
|
|
|
|
|
static inline __u32 f2fs_mask_flags(umode_t mode, __u32 flags)
|
|
|
|
{
|
|
|
|
if (S_ISDIR(mode))
|
|
|
|
return flags;
|
|
|
|
else if (S_ISREG(mode))
|
|
|
|
return flags & F2FS_REG_FLMASK;
|
|
|
|
else
|
|
|
|
return flags & F2FS_OTHER_FLMASK;
|
|
|
|
}
|
|
|
|
|
2016-05-21 00:52:20 +08:00
|
|
|
static inline void __mark_inode_dirty_flag(struct inode *inode,
|
|
|
|
int flag, bool set)
|
|
|
|
{
|
|
|
|
switch (flag) {
|
|
|
|
case FI_INLINE_XATTR:
|
|
|
|
case FI_INLINE_DATA:
|
|
|
|
case FI_INLINE_DENTRY:
|
2018-01-11 10:26:19 +08:00
|
|
|
case FI_NEW_INODE:
|
2016-05-21 00:52:20 +08:00
|
|
|
if (set)
|
|
|
|
return;
|
2020-08-24 06:36:59 +08:00
|
|
|
fallthrough;
|
2016-05-21 00:52:20 +08:00
|
|
|
case FI_DATA_EXIST:
|
|
|
|
case FI_INLINE_DOTS:
|
2017-12-08 08:25:39 +08:00
|
|
|
case FI_PIN_FILE:
|
2021-05-26 02:39:35 +08:00
|
|
|
case FI_COMPRESS_RELEASED:
|
2016-10-15 02:51:23 +08:00
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
2016-05-21 00:52:20 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-05-21 01:13:22 +08:00
|
|
|
static inline void set_inode_flag(struct inode *inode, int flag)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2020-07-31 14:18:13 +08:00
|
|
|
set_bit(flag, F2FS_I(inode)->flags);
|
2016-05-21 00:52:20 +08:00
|
|
|
__mark_inode_dirty_flag(inode, flag, true);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2016-05-21 01:13:22 +08:00
|
|
|
static inline int is_inode_flag_set(struct inode *inode, int flag)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2020-03-23 11:18:07 +08:00
|
|
|
return test_bit(flag, F2FS_I(inode)->flags);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2016-05-21 01:13:22 +08:00
|
|
|
static inline void clear_inode_flag(struct inode *inode, int flag)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2020-07-31 14:18:13 +08:00
|
|
|
clear_bit(flag, F2FS_I(inode)->flags);
|
2016-05-21 00:52:20 +08:00
|
|
|
__mark_inode_dirty_flag(inode, flag, false);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
f2fs: add fs-verity support
Add fs-verity support to f2fs. fs-verity is a filesystem feature that
enables transparent integrity protection and authentication of read-only
files. It uses a dm-verity like mechanism at the file level: a Merkle
tree is used to verify any block in the file in log(filesize) time. It
is implemented mainly by helper functions in fs/verity/. See
Documentation/filesystems/fsverity.rst for the full documentation.
The f2fs support for fs-verity consists of:
- Adding a filesystem feature flag and an inode flag for fs-verity.
- Implementing the fsverity_operations to support enabling verity on an
inode and reading/writing the verity metadata.
- Updating ->readpages() to verify data as it's read from verity files
and to support reading verity metadata pages.
- Updating ->write_begin(), ->write_end(), and ->writepages() to support
writing verity metadata pages.
- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
Like ext4, f2fs stores the verity metadata (Merkle tree and
fsverity_descriptor) past the end of the file, starting at the first 64K
boundary beyond i_size. This approach works because (a) verity files
are readonly, and (b) pages fully beyond i_size aren't visible to
userspace but can be read/written internally by f2fs with only some
relatively small changes to f2fs. Extended attributes cannot be used
because (a) f2fs limits the total size of an inode's xattr entries to
4096 bytes, which wouldn't be enough for even a single Merkle tree
block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity
metadata *must* be encrypted when the file is because it contains hashes
of the plaintext data.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-07-23 00:26:24 +08:00
|
|
|
static inline bool f2fs_verity_in_progress(struct inode *inode)
|
|
|
|
{
|
|
|
|
return IS_ENABLED(CONFIG_FS_VERITY) &&
|
|
|
|
is_inode_flag_set(inode, FI_VERITY_IN_PROGRESS);
|
|
|
|
}
|
|
|
|
|
2016-05-21 01:13:22 +08:00
|
|
|
static inline void set_acl_inode(struct inode *inode, umode_t mode)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2016-05-21 01:13:22 +08:00
|
|
|
F2FS_I(inode)->i_acl_mode = mode;
|
|
|
|
set_inode_flag(inode, FI_ACL_MODE);
|
2016-10-15 02:51:23 +08:00
|
|
|
f2fs_mark_inode_dirty_sync(inode, false);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2016-05-21 00:43:20 +08:00
|
|
|
static inline void f2fs_i_links_write(struct inode *inode, bool inc)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2016-05-21 00:43:20 +08:00
|
|
|
if (inc)
|
|
|
|
inc_nlink(inode);
|
|
|
|
else
|
|
|
|
drop_nlink(inode);
|
2016-10-15 02:51:23 +08:00
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
2016-05-21 00:43:20 +08:00
|
|
|
}
|
|
|
|
|
2016-05-21 00:26:06 +08:00
|
|
|
static inline void f2fs_i_blocks_write(struct inode *inode,
|
2017-07-09 00:13:07 +08:00
|
|
|
block_t diff, bool add, bool claim)
|
2016-05-21 00:26:06 +08:00
|
|
|
{
|
2016-05-21 11:42:37 +08:00
|
|
|
bool clean = !is_inode_flag_set(inode, FI_DIRTY_INODE);
|
|
|
|
bool recover = is_inode_flag_set(inode, FI_AUTO_RECOVER);
|
|
|
|
|
2017-07-09 00:13:07 +08:00
|
|
|
/* add = 1, claim = 1 should be dquot_reserve_block in pair */
|
|
|
|
if (add) {
|
|
|
|
if (claim)
|
|
|
|
dquot_claim_block(inode, diff);
|
|
|
|
else
|
|
|
|
dquot_alloc_block_nofail(inode, diff);
|
|
|
|
} else {
|
|
|
|
dquot_free_block(inode, diff);
|
|
|
|
}
|
|
|
|
|
2016-10-15 02:51:23 +08:00
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
2016-05-21 11:42:37 +08:00
|
|
|
if (clean || recover)
|
|
|
|
set_inode_flag(inode, FI_AUTO_RECOVER);
|
2016-05-21 00:26:06 +08:00
|
|
|
}
|
|
|
|
|
2016-05-21 00:22:03 +08:00
|
|
|
static inline void f2fs_i_size_write(struct inode *inode, loff_t i_size)
|
|
|
|
{
|
2016-05-21 11:42:37 +08:00
|
|
|
bool clean = !is_inode_flag_set(inode, FI_DIRTY_INODE);
|
|
|
|
bool recover = is_inode_flag_set(inode, FI_AUTO_RECOVER);
|
|
|
|
|
2016-05-21 00:22:03 +08:00
|
|
|
if (i_size_read(inode) == i_size)
|
|
|
|
return;
|
|
|
|
|
|
|
|
i_size_write(inode, i_size);
|
2016-10-15 02:51:23 +08:00
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
2016-05-21 11:42:37 +08:00
|
|
|
if (clean || recover)
|
|
|
|
set_inode_flag(inode, FI_AUTO_RECOVER);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2016-05-21 00:52:20 +08:00
|
|
|
static inline void f2fs_i_depth_write(struct inode *inode, unsigned int depth)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
{
|
2016-05-21 00:52:20 +08:00
|
|
|
F2FS_I(inode)->i_current_depth = depth;
|
2016-10-15 02:51:23 +08:00
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
}
|
|
|
|
|
2017-12-08 08:25:39 +08:00
|
|
|
static inline void f2fs_i_gc_failures_write(struct inode *inode,
|
|
|
|
unsigned int count)
|
|
|
|
{
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN] = count;
|
2017-12-08 08:25:39 +08:00
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
|
|
|
}
|
|
|
|
|
2016-05-21 00:52:20 +08:00
|
|
|
static inline void f2fs_i_xnid_write(struct inode *inode, nid_t xnid)
|
2013-08-08 14:16:22 +08:00
|
|
|
{
|
2016-05-21 00:52:20 +08:00
|
|
|
F2FS_I(inode)->i_xattr_nid = xnid;
|
2016-10-15 02:51:23 +08:00
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
2016-05-21 00:52:20 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void f2fs_i_pino_write(struct inode *inode, nid_t pino)
|
|
|
|
{
|
|
|
|
F2FS_I(inode)->i_pino = pino;
|
2016-10-15 02:51:23 +08:00
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
2016-05-21 00:52:20 +08:00
|
|
|
}
|
|
|
|
|
2016-05-21 01:13:22 +08:00
|
|
|
static inline void get_inline_info(struct inode *inode, struct f2fs_inode *ri)
|
2013-08-08 14:16:22 +08:00
|
|
|
{
|
2016-05-21 00:52:20 +08:00
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
|
|
|
|
2013-08-08 14:16:22 +08:00
|
|
|
if (ri->i_inline & F2FS_INLINE_XATTR)
|
2020-03-23 11:18:07 +08:00
|
|
|
set_bit(FI_INLINE_XATTR, fi->flags);
|
2013-11-10 23:13:16 +08:00
|
|
|
if (ri->i_inline & F2FS_INLINE_DATA)
|
2020-03-23 11:18:07 +08:00
|
|
|
set_bit(FI_INLINE_DATA, fi->flags);
|
2014-09-24 18:15:19 +08:00
|
|
|
if (ri->i_inline & F2FS_INLINE_DENTRY)
|
2020-03-23 11:18:07 +08:00
|
|
|
set_bit(FI_INLINE_DENTRY, fi->flags);
|
2014-10-24 10:48:09 +08:00
|
|
|
if (ri->i_inline & F2FS_DATA_EXIST)
|
2020-03-23 11:18:07 +08:00
|
|
|
set_bit(FI_DATA_EXIST, fi->flags);
|
2015-03-31 06:07:16 +08:00
|
|
|
if (ri->i_inline & F2FS_INLINE_DOTS)
|
2020-03-23 11:18:07 +08:00
|
|
|
set_bit(FI_INLINE_DOTS, fi->flags);
|
2017-07-19 00:19:06 +08:00
|
|
|
if (ri->i_inline & F2FS_EXTRA_ATTR)
|
2020-03-23 11:18:07 +08:00
|
|
|
set_bit(FI_EXTRA_ATTR, fi->flags);
|
2017-12-08 08:25:39 +08:00
|
|
|
if (ri->i_inline & F2FS_PIN_FILE)
|
2020-03-23 11:18:07 +08:00
|
|
|
set_bit(FI_PIN_FILE, fi->flags);
|
2021-05-26 02:39:35 +08:00
|
|
|
if (ri->i_inline & F2FS_COMPRESS_RELEASED)
|
|
|
|
set_bit(FI_COMPRESS_RELEASED, fi->flags);
|
2013-08-08 14:16:22 +08:00
|
|
|
}
|
|
|
|
|
2016-05-21 01:13:22 +08:00
|
|
|
static inline void set_raw_inline(struct inode *inode, struct f2fs_inode *ri)
|
2013-08-08 14:16:22 +08:00
|
|
|
{
|
|
|
|
ri->i_inline = 0;
|
|
|
|
|
2016-05-21 01:13:22 +08:00
|
|
|
if (is_inode_flag_set(inode, FI_INLINE_XATTR))
|
2013-08-08 14:16:22 +08:00
|
|
|
ri->i_inline |= F2FS_INLINE_XATTR;
|
2016-05-21 01:13:22 +08:00
|
|
|
if (is_inode_flag_set(inode, FI_INLINE_DATA))
|
2013-11-10 23:13:16 +08:00
|
|
|
ri->i_inline |= F2FS_INLINE_DATA;
|
2016-05-21 01:13:22 +08:00
|
|
|
if (is_inode_flag_set(inode, FI_INLINE_DENTRY))
|
2014-09-24 18:15:19 +08:00
|
|
|
ri->i_inline |= F2FS_INLINE_DENTRY;
|
2016-05-21 01:13:22 +08:00
|
|
|
if (is_inode_flag_set(inode, FI_DATA_EXIST))
|
2014-10-24 10:48:09 +08:00
|
|
|
ri->i_inline |= F2FS_DATA_EXIST;
|
2016-05-21 01:13:22 +08:00
|
|
|
if (is_inode_flag_set(inode, FI_INLINE_DOTS))
|
2015-03-31 06:07:16 +08:00
|
|
|
ri->i_inline |= F2FS_INLINE_DOTS;
|
2017-07-19 00:19:06 +08:00
|
|
|
if (is_inode_flag_set(inode, FI_EXTRA_ATTR))
|
|
|
|
ri->i_inline |= F2FS_EXTRA_ATTR;
|
2017-12-08 08:25:39 +08:00
|
|
|
if (is_inode_flag_set(inode, FI_PIN_FILE))
|
|
|
|
ri->i_inline |= F2FS_PIN_FILE;
|
2021-05-26 02:39:35 +08:00
|
|
|
if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED))
|
|
|
|
ri->i_inline |= F2FS_COMPRESS_RELEASED;
|
2017-07-19 00:19:06 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline int f2fs_has_extra_attr(struct inode *inode)
|
|
|
|
{
|
|
|
|
return is_inode_flag_set(inode, FI_EXTRA_ATTR);
|
2013-08-08 14:16:22 +08:00
|
|
|
}
|
|
|
|
|
2014-03-12 15:59:03 +08:00
|
|
|
static inline int f2fs_has_inline_xattr(struct inode *inode)
|
|
|
|
{
|
2016-05-21 01:13:22 +08:00
|
|
|
return is_inode_flag_set(inode, FI_INLINE_XATTR);
|
2014-03-12 15:59:03 +08:00
|
|
|
}
|
|
|
|
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
static inline int f2fs_compressed_file(struct inode *inode)
|
|
|
|
{
|
|
|
|
return S_ISREG(inode->i_mode) &&
|
|
|
|
is_inode_flag_set(inode, FI_COMPRESSED_FILE);
|
|
|
|
}
|
|
|
|
|
2020-12-01 12:08:02 +08:00
|
|
|
static inline bool f2fs_need_compress_data(struct inode *inode)
|
|
|
|
{
|
|
|
|
int compress_mode = F2FS_OPTION(F2FS_I_SB(inode)).compress_mode;
|
|
|
|
|
|
|
|
if (!f2fs_compressed_file(inode))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
if (compress_mode == COMPR_MODE_FS)
|
|
|
|
return true;
|
|
|
|
else if (compress_mode == COMPR_MODE_USER &&
|
|
|
|
is_inode_flag_set(inode, FI_ENABLE_COMPRESS))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2016-01-26 15:39:35 +08:00
|
|
|
static inline unsigned int addrs_per_inode(struct inode *inode)
|
2013-08-12 20:08:03 +08:00
|
|
|
{
|
2019-03-25 21:08:19 +08:00
|
|
|
unsigned int addrs = CUR_ADDRS_PER_INODE(inode) -
|
|
|
|
get_inline_xattr_addrs(inode);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
|
|
|
|
if (!f2fs_compressed_file(inode))
|
|
|
|
return addrs;
|
|
|
|
return ALIGN_DOWN(addrs, F2FS_I(inode)->i_cluster_size);
|
2019-03-25 21:08:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline unsigned int addrs_per_block(struct inode *inode)
|
|
|
|
{
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
if (!f2fs_compressed_file(inode))
|
|
|
|
return DEF_ADDRS_PER_BLOCK;
|
|
|
|
return ALIGN_DOWN(DEF_ADDRS_PER_BLOCK, F2FS_I(inode)->i_cluster_size);
|
2013-08-12 20:08:03 +08:00
|
|
|
}
|
|
|
|
|
f2fs: support flexible inline xattr size
Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.
In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.
So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.
To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.
Inode disk layout:
+----------------------+
| .i_mode |
| ... |
| .i_ext |
+----------------------+
| .i_extra_isize |
| .i_inline_xattr_size |-----------+
| ... | |
+----------------------+ |
| .i_addr | |
| - block address or | |
| - inline data | |
+----------------------+<---+ v
| inline xattr | +---inline xattr range
+----------------------+<---+
| .i_nid |
+----------------------+
| node_footer |
| (nid, ino, offset) |
+----------------------+
Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.
Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-06 21:59:50 +08:00
|
|
|
static inline void *inline_xattr_addr(struct inode *inode, struct page *page)
|
2013-08-14 20:57:27 +08:00
|
|
|
{
|
2014-02-27 19:52:21 +08:00
|
|
|
struct f2fs_inode *ri = F2FS_INODE(page);
|
2017-01-31 02:55:18 +08:00
|
|
|
|
2013-08-14 20:57:27 +08:00
|
|
|
return (void *)&(ri->i_addr[DEF_ADDRS_PER_INODE -
|
2018-01-17 16:31:36 +08:00
|
|
|
get_inline_xattr_addrs(inode)]);
|
2013-08-14 20:57:27 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline int inline_xattr_size(struct inode *inode)
|
|
|
|
{
|
2019-04-11 11:48:10 +08:00
|
|
|
if (f2fs_has_inline_xattr(inode))
|
|
|
|
return get_inline_xattr_addrs(inode) * sizeof(__le32);
|
|
|
|
return 0;
|
2013-08-14 20:57:27 +08:00
|
|
|
}
|
|
|
|
|
2013-11-26 10:08:57 +08:00
|
|
|
static inline int f2fs_has_inline_data(struct inode *inode)
|
|
|
|
{
|
2016-05-21 01:13:22 +08:00
|
|
|
return is_inode_flag_set(inode, FI_INLINE_DATA);
|
2013-11-26 10:08:57 +08:00
|
|
|
}
|
|
|
|
|
2014-10-24 10:48:09 +08:00
|
|
|
static inline int f2fs_exist_data(struct inode *inode)
|
|
|
|
{
|
2016-05-21 01:13:22 +08:00
|
|
|
return is_inode_flag_set(inode, FI_DATA_EXIST);
|
2014-10-24 10:48:09 +08:00
|
|
|
}
|
|
|
|
|
2015-03-31 06:07:16 +08:00
|
|
|
static inline int f2fs_has_inline_dots(struct inode *inode)
|
|
|
|
{
|
2016-05-21 01:13:22 +08:00
|
|
|
return is_inode_flag_set(inode, FI_INLINE_DOTS);
|
2015-03-31 06:07:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
static inline int f2fs_is_mmap_file(struct inode *inode)
|
|
|
|
{
|
|
|
|
return is_inode_flag_set(inode, FI_MMAP_FILE);
|
|
|
|
}
|
|
|
|
|
2017-12-08 08:25:39 +08:00
|
|
|
static inline bool f2fs_is_pinned_file(struct inode *inode)
|
|
|
|
{
|
|
|
|
return is_inode_flag_set(inode, FI_PIN_FILE);
|
|
|
|
}
|
|
|
|
|
2014-10-07 08:39:50 +08:00
|
|
|
static inline bool f2fs_is_atomic_file(struct inode *inode)
|
|
|
|
{
|
2016-05-21 01:13:22 +08:00
|
|
|
return is_inode_flag_set(inode, FI_ATOMIC_FILE);
|
2014-10-07 08:39:50 +08:00
|
|
|
}
|
|
|
|
|
2017-01-07 18:50:26 +08:00
|
|
|
static inline bool f2fs_is_commit_atomic_write(struct inode *inode)
|
|
|
|
{
|
|
|
|
return is_inode_flag_set(inode, FI_ATOMIC_COMMIT);
|
|
|
|
}
|
|
|
|
|
2014-10-07 07:11:16 +08:00
|
|
|
static inline bool f2fs_is_volatile_file(struct inode *inode)
|
|
|
|
{
|
2016-05-21 01:13:22 +08:00
|
|
|
return is_inode_flag_set(inode, FI_VOLATILE_FILE);
|
2014-10-07 07:11:16 +08:00
|
|
|
}
|
|
|
|
|
2015-03-18 08:16:35 +08:00
|
|
|
static inline bool f2fs_is_first_block_written(struct inode *inode)
|
|
|
|
{
|
2016-05-21 01:13:22 +08:00
|
|
|
return is_inode_flag_set(inode, FI_FIRST_BLOCK_WRITTEN);
|
2015-03-18 08:16:35 +08:00
|
|
|
}
|
|
|
|
|
2014-12-09 22:08:59 +08:00
|
|
|
static inline bool f2fs_is_drop_cache(struct inode *inode)
|
|
|
|
{
|
2016-05-21 01:13:22 +08:00
|
|
|
return is_inode_flag_set(inode, FI_DROP_CACHE);
|
2014-12-09 22:08:59 +08:00
|
|
|
}
|
|
|
|
|
2017-07-19 00:19:05 +08:00
|
|
|
static inline void *inline_data_addr(struct inode *inode, struct page *page)
|
2013-11-10 23:13:16 +08:00
|
|
|
{
|
2014-02-27 19:52:21 +08:00
|
|
|
struct f2fs_inode *ri = F2FS_INODE(page);
|
2017-07-19 00:19:06 +08:00
|
|
|
int extra_size = get_extra_isize(inode);
|
2017-01-31 02:55:18 +08:00
|
|
|
|
2017-07-19 00:19:06 +08:00
|
|
|
return (void *)&(ri->i_addr[extra_size + DEF_INLINE_RESERVED_SIZE]);
|
2013-11-10 23:13:16 +08:00
|
|
|
}
|
|
|
|
|
2014-09-24 18:15:19 +08:00
|
|
|
static inline int f2fs_has_inline_dentry(struct inode *inode)
|
|
|
|
{
|
2016-05-21 01:13:22 +08:00
|
|
|
return is_inode_flag_set(inode, FI_INLINE_DENTRY);
|
2014-09-24 18:15:19 +08:00
|
|
|
}
|
|
|
|
|
2015-04-21 04:44:41 +08:00
|
|
|
static inline int is_file(struct inode *inode, int type)
|
|
|
|
{
|
|
|
|
return F2FS_I(inode)->i_advise & type;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void set_file(struct inode *inode, int type)
|
|
|
|
{
|
2021-12-05 01:55:35 +08:00
|
|
|
if (is_file(inode, type))
|
|
|
|
return;
|
2015-04-21 04:44:41 +08:00
|
|
|
F2FS_I(inode)->i_advise |= type;
|
2016-10-15 02:51:23 +08:00
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
2015-04-21 04:44:41 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void clear_file(struct inode *inode, int type)
|
|
|
|
{
|
2021-12-05 01:55:35 +08:00
|
|
|
if (!is_file(inode, type))
|
|
|
|
return;
|
2015-04-21 04:44:41 +08:00
|
|
|
F2FS_I(inode)->i_advise &= ~type;
|
2016-10-15 02:51:23 +08:00
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
2015-04-21 04:44:41 +08:00
|
|
|
}
|
|
|
|
|
f2fs: fix to update time in lazytime mode
generic/018 reports an inconsistent status of atime, the
testcase is as below:
- open file with O_SYNC
- write file to construct fraged space
- calc md5 of file
- record {a,c,m}time
- defrag file --- do nothing
- umount & mount
- check {a,c,m}time
The root cause is, as f2fs enables lazytime by default, atime
update will dirty vfs inode, rather than dirtying f2fs inode (by set
with FI_DIRTY_INODE), so later f2fs_write_inode() called from VFS will
fail to update inode page due to our skip:
f2fs_write_inode()
if (is_inode_flag_set(inode, FI_DIRTY_INODE))
return 0;
So eventually, after evict(), we lose last atime for ever.
To fix this issue, we need to check whether {a,c,m,cr}time is
consistent in between inode cache and inode page, and only skip
f2fs_update_inode() if f2fs inode is not dirty and time is
consistent as well.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-27 18:01:35 +08:00
|
|
|
static inline bool f2fs_is_time_consistent(struct inode *inode)
|
|
|
|
{
|
|
|
|
if (!timespec64_equal(F2FS_I(inode)->i_disk_time, &inode->i_atime))
|
|
|
|
return false;
|
|
|
|
if (!timespec64_equal(F2FS_I(inode)->i_disk_time + 1, &inode->i_ctime))
|
|
|
|
return false;
|
|
|
|
if (!timespec64_equal(F2FS_I(inode)->i_disk_time + 2, &inode->i_mtime))
|
|
|
|
return false;
|
|
|
|
if (!timespec64_equal(F2FS_I(inode)->i_disk_time + 3,
|
|
|
|
&F2FS_I(inode)->i_crtime))
|
|
|
|
return false;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2016-11-29 07:33:38 +08:00
|
|
|
static inline bool f2fs_skip_inode_update(struct inode *inode, int dsync)
|
|
|
|
{
|
2017-10-09 17:55:19 +08:00
|
|
|
bool ret;
|
|
|
|
|
2016-11-29 07:33:38 +08:00
|
|
|
if (dsync) {
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
|
|
|
|
|
|
|
spin_lock(&sbi->inode_lock[DIRTY_META]);
|
|
|
|
ret = list_empty(&F2FS_I(inode)->gdirty_list);
|
|
|
|
spin_unlock(&sbi->inode_lock[DIRTY_META]);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
if (!is_inode_flag_set(inode, FI_AUTO_RECOVER) ||
|
|
|
|
file_keep_isize(inode) ||
|
2018-03-29 19:27:12 +08:00
|
|
|
i_size_read(inode) & ~PAGE_MASK)
|
2016-11-29 07:33:38 +08:00
|
|
|
return false;
|
2017-10-09 17:55:19 +08:00
|
|
|
|
f2fs: fix to update time in lazytime mode
generic/018 reports an inconsistent status of atime, the
testcase is as below:
- open file with O_SYNC
- write file to construct fraged space
- calc md5 of file
- record {a,c,m}time
- defrag file --- do nothing
- umount & mount
- check {a,c,m}time
The root cause is, as f2fs enables lazytime by default, atime
update will dirty vfs inode, rather than dirtying f2fs inode (by set
with FI_DIRTY_INODE), so later f2fs_write_inode() called from VFS will
fail to update inode page due to our skip:
f2fs_write_inode()
if (is_inode_flag_set(inode, FI_DIRTY_INODE))
return 0;
So eventually, after evict(), we lose last atime for ever.
To fix this issue, we need to check whether {a,c,m,cr}time is
consistent in between inode cache and inode page, and only skip
f2fs_update_inode() if f2fs inode is not dirty and time is
consistent as well.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-27 18:01:35 +08:00
|
|
|
if (!f2fs_is_time_consistent(inode))
|
2018-03-30 13:50:41 +08:00
|
|
|
return false;
|
|
|
|
|
2020-02-27 19:30:03 +08:00
|
|
|
spin_lock(&F2FS_I(inode)->i_size_lock);
|
2017-10-09 17:55:19 +08:00
|
|
|
ret = F2FS_I(inode)->last_disk_size == i_size_read(inode);
|
2020-02-27 19:30:03 +08:00
|
|
|
spin_unlock(&F2FS_I(inode)->i_size_lock);
|
2017-10-09 17:55:19 +08:00
|
|
|
|
|
|
|
return ret;
|
2015-04-21 04:44:41 +08:00
|
|
|
}
|
|
|
|
|
2018-03-01 23:40:31 +08:00
|
|
|
static inline bool f2fs_readonly(struct super_block *sb)
|
2013-05-20 19:28:47 +08:00
|
|
|
{
|
2018-03-01 23:40:31 +08:00
|
|
|
return sb_rdonly(sb);
|
2013-05-20 19:28:47 +08:00
|
|
|
}
|
|
|
|
|
2014-08-12 07:49:25 +08:00
|
|
|
static inline bool f2fs_cp_error(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2016-09-20 11:04:18 +08:00
|
|
|
return is_set_ckpt_flags(sbi, CP_ERROR_FLAG);
|
2014-08-12 07:49:25 +08:00
|
|
|
}
|
|
|
|
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 15:59:04 +08:00
|
|
|
static inline bool is_dot_dotdot(const u8 *name, size_t len)
|
2015-04-26 15:15:29 +08:00
|
|
|
{
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 15:59:04 +08:00
|
|
|
if (len == 1 && name[0] == '.')
|
2015-04-26 15:15:29 +08:00
|
|
|
return true;
|
|
|
|
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 15:59:04 +08:00
|
|
|
if (len == 2 && name[0] == '.' && name[1] == '.')
|
2015-04-26 15:15:29 +08:00
|
|
|
return true;
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2016-09-23 21:30:09 +08:00
|
|
|
static inline void *f2fs_kmalloc(struct f2fs_sb_info *sbi,
|
|
|
|
size_t size, gfp_t flags)
|
2016-04-30 06:16:42 +08:00
|
|
|
{
|
2017-02-25 11:08:28 +08:00
|
|
|
if (time_to_inject(sbi, FAULT_KMALLOC)) {
|
2019-11-01 17:53:23 +08:00
|
|
|
f2fs_show_injection_info(sbi, FAULT_KMALLOC);
|
2016-04-30 06:49:56 +08:00
|
|
|
return NULL;
|
2017-02-25 11:08:28 +08:00
|
|
|
}
|
2018-08-14 05:38:06 +08:00
|
|
|
|
2020-06-05 12:57:48 +08:00
|
|
|
return kmalloc(size, flags);
|
2016-04-30 06:16:42 +08:00
|
|
|
}
|
|
|
|
|
2017-11-30 19:28:17 +08:00
|
|
|
static inline void *f2fs_kzalloc(struct f2fs_sb_info *sbi,
|
|
|
|
size_t size, gfp_t flags)
|
|
|
|
{
|
|
|
|
return f2fs_kmalloc(sbi, size, flags | __GFP_ZERO);
|
|
|
|
}
|
|
|
|
|
2017-11-30 19:28:18 +08:00
|
|
|
static inline void *f2fs_kvmalloc(struct f2fs_sb_info *sbi,
|
|
|
|
size_t size, gfp_t flags)
|
|
|
|
{
|
|
|
|
if (time_to_inject(sbi, FAULT_KVMALLOC)) {
|
2019-11-01 17:53:23 +08:00
|
|
|
f2fs_show_injection_info(sbi, FAULT_KVMALLOC);
|
2017-11-30 19:28:18 +08:00
|
|
|
return NULL;
|
|
|
|
}
|
2018-08-14 05:38:06 +08:00
|
|
|
|
2017-11-30 19:28:18 +08:00
|
|
|
return kvmalloc(size, flags);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void *f2fs_kvzalloc(struct f2fs_sb_info *sbi,
|
|
|
|
size_t size, gfp_t flags)
|
|
|
|
{
|
|
|
|
return f2fs_kvmalloc(sbi, size, flags | __GFP_ZERO);
|
|
|
|
}
|
|
|
|
|
2017-07-19 00:19:06 +08:00
|
|
|
static inline int get_extra_isize(struct inode *inode)
|
2017-07-19 00:19:05 +08:00
|
|
|
{
|
2017-07-19 00:19:06 +08:00
|
|
|
return F2FS_I(inode)->i_extra_isize / sizeof(__le32);
|
2017-07-19 00:19:05 +08:00
|
|
|
}
|
|
|
|
|
f2fs: support flexible inline xattr size
Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.
In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.
So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.
To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.
Inode disk layout:
+----------------------+
| .i_mode |
| ... |
| .i_ext |
+----------------------+
| .i_extra_isize |
| .i_inline_xattr_size |-----------+
| ... | |
+----------------------+ |
| .i_addr | |
| - block address or | |
| - inline data | |
+----------------------+<---+ v
| inline xattr | +---inline xattr range
+----------------------+<---+
| .i_nid |
+----------------------+
| node_footer |
| (nid, ino, offset) |
+----------------------+
Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.
Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-06 21:59:50 +08:00
|
|
|
static inline int get_inline_xattr_addrs(struct inode *inode)
|
|
|
|
{
|
|
|
|
return F2FS_I(inode)->i_inline_xattr_size;
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
#define f2fs_get_inode_mode(i) \
|
2016-05-21 01:13:22 +08:00
|
|
|
((is_inode_flag_set(i, FI_ACL_MODE)) ? \
|
2013-12-20 21:16:45 +08:00
|
|
|
(F2FS_I(i)->i_acl_mode) : ((i)->i_mode))
|
|
|
|
|
2017-07-19 00:19:06 +08:00
|
|
|
#define F2FS_TOTAL_EXTRA_ATTR_SIZE \
|
|
|
|
(offsetof(struct f2fs_inode, i_extra_end) - \
|
|
|
|
offsetof(struct f2fs_inode, i_extra_isize)) \
|
|
|
|
|
2017-07-26 00:01:41 +08:00
|
|
|
#define F2FS_OLD_ATTRIBUTE_SIZE (offsetof(struct f2fs_inode, i_addr))
|
|
|
|
#define F2FS_FITS_IN_INODE(f2fs_inode, extra_isize, field) \
|
2019-01-14 22:05:14 +08:00
|
|
|
((offsetof(typeof(*(f2fs_inode)), field) + \
|
2017-07-26 00:01:41 +08:00
|
|
|
sizeof((f2fs_inode)->field)) \
|
2019-01-14 22:05:14 +08:00
|
|
|
<= (F2FS_OLD_ATTRIBUTE_SIZE + (extra_isize))) \
|
2017-07-26 00:01:41 +08:00
|
|
|
|
2018-10-24 18:37:26 +08:00
|
|
|
#define __is_large_section(sbi) ((sbi)->segs_per_sec > 1)
|
|
|
|
|
2019-04-15 15:26:31 +08:00
|
|
|
#define __is_meta_io(fio) (PAGE_TYPE_OF_BIO((fio)->type) == META)
|
f2fs: fix to do sanity check with block address in main area
This patch add to do sanity check with below field:
- cp_pack_total_block_count
- blkaddr of data/node
- extent info
- Overview
BUG() in verify_block_addr() when writing to a corrupted f2fs image
- Reproduce (4.18 upstream kernel)
- POC (poc.c)
static void activity(char *mpoint) {
char *foo_bar_baz;
int err;
static int buf[8192];
memset(buf, 0, sizeof(buf));
err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777);
if (fd >= 0) {
write(fd, (char *)buf, sizeof(buf));
fdatasync(fd);
close(fd);
}
}
int main(int argc, char *argv[]) {
activity(argv[1]);
return 0;
}
- Kernel message
[ 689.349473] F2FS-fs (loop0): Mounted with checkpoint version = 3
[ 699.728662] WARNING: CPU: 0 PID: 1309 at fs/f2fs/segment.c:2860 f2fs_inplace_write_data+0x232/0x240
[ 699.728670] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 699.729056] CPU: 0 PID: 1309 Comm: a.out Not tainted 4.18.0-rc1+ #4
[ 699.729064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 699.729074] RIP: 0010:f2fs_inplace_write_data+0x232/0x240
[ 699.729076] Code: ff e9 cf fe ff ff 49 8d 7d 10 e8 39 45 ad ff 4d 8b 7d 10 be 04 00 00 00 49 8d 7f 48 e8 07 49 ad ff 45 8b 7f 48 e9 fb fe ff ff <0f> 0b f0 41 80 4d 48 04 e9 65 fe ff ff 90 66 66 66 66 90 55 48 8d
[ 699.729130] RSP: 0018:ffff8801f43af568 EFLAGS: 00010202
[ 699.729139] RAX: 000000000000003f RBX: ffff8801f43af7b8 RCX: ffffffffb88c9113
[ 699.729142] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8802024e5540
[ 699.729144] RBP: ffff8801f43af590 R08: 0000000000000009 R09: ffffffffffffffe8
[ 699.729147] R10: 0000000000000001 R11: ffffed0039b0596a R12: ffff8802024e5540
[ 699.729149] R13: ffff8801f0335500 R14: ffff8801e3e7a700 R15: ffff8801e1ee4450
[ 699.729154] FS: 00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[ 699.729156] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 699.729159] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
[ 699.729171] Call Trace:
[ 699.729192] f2fs_do_write_data_page+0x2e2/0xe00
[ 699.729203] ? f2fs_should_update_outplace+0xd0/0xd0
[ 699.729238] ? memcg_drain_all_list_lrus+0x280/0x280
[ 699.729269] ? __radix_tree_replace+0xa3/0x120
[ 699.729276] __write_data_page+0x5c7/0xe30
[ 699.729291] ? kasan_check_read+0x11/0x20
[ 699.729310] ? page_mapped+0x8a/0x110
[ 699.729321] ? page_mkclean+0xe9/0x160
[ 699.729327] ? f2fs_do_write_data_page+0xe00/0xe00
[ 699.729331] ? invalid_page_referenced_vma+0x130/0x130
[ 699.729345] ? clear_page_dirty_for_io+0x332/0x450
[ 699.729351] f2fs_write_cache_pages+0x4ca/0x860
[ 699.729358] ? __write_data_page+0xe30/0xe30
[ 699.729374] ? percpu_counter_add_batch+0x22/0xa0
[ 699.729380] ? kasan_check_write+0x14/0x20
[ 699.729391] ? _raw_spin_lock+0x17/0x40
[ 699.729403] ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
[ 699.729413] ? iov_iter_advance+0x113/0x640
[ 699.729418] ? f2fs_write_end+0x133/0x2e0
[ 699.729423] ? balance_dirty_pages_ratelimited+0x239/0x640
[ 699.729428] f2fs_write_data_pages+0x329/0x520
[ 699.729433] ? generic_perform_write+0x250/0x320
[ 699.729438] ? f2fs_write_cache_pages+0x860/0x860
[ 699.729454] ? current_time+0x110/0x110
[ 699.729459] ? f2fs_preallocate_blocks+0x1ef/0x370
[ 699.729464] do_writepages+0x37/0xb0
[ 699.729468] ? f2fs_write_cache_pages+0x860/0x860
[ 699.729472] ? do_writepages+0x37/0xb0
[ 699.729478] __filemap_fdatawrite_range+0x19a/0x1f0
[ 699.729483] ? delete_from_page_cache_batch+0x4e0/0x4e0
[ 699.729496] ? __vfs_write+0x2b2/0x410
[ 699.729501] file_write_and_wait_range+0x66/0xb0
[ 699.729506] f2fs_do_sync_file+0x1f9/0xd90
[ 699.729511] ? truncate_partial_data_page+0x290/0x290
[ 699.729521] ? __sb_end_write+0x30/0x50
[ 699.729526] ? vfs_write+0x20f/0x260
[ 699.729530] f2fs_sync_file+0x9a/0xb0
[ 699.729534] ? f2fs_do_sync_file+0xd90/0xd90
[ 699.729548] vfs_fsync_range+0x68/0x100
[ 699.729554] ? __fget_light+0xc9/0xe0
[ 699.729558] do_fsync+0x3d/0x70
[ 699.729562] __x64_sys_fdatasync+0x24/0x30
[ 699.729585] do_syscall_64+0x78/0x170
[ 699.729595] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 699.729613] RIP: 0033:0x7f9bf930d800
[ 699.729615] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
[ 699.729668] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[ 699.729673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
[ 699.729675] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
[ 699.729678] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
[ 699.729680] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
[ 699.729683] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
[ 699.729687] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 699.729782] ------------[ cut here ]------------
[ 699.729785] kernel BUG at fs/f2fs/segment.h:654!
[ 699.731055] invalid opcode: 0000 [#1] SMP KASAN PTI
[ 699.732104] CPU: 0 PID: 1309 Comm: a.out Tainted: G W 4.18.0-rc1+ #4
[ 699.733684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 699.735611] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
[ 699.736649] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
[ 699.740524] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
[ 699.741573] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
[ 699.743006] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
[ 699.744426] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
[ 699.745833] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
[ 699.747256] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
[ 699.748683] FS: 00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[ 699.750293] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 699.751462] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
[ 699.752874] Call Trace:
[ 699.753386] ? f2fs_inplace_write_data+0x93/0x240
[ 699.754341] f2fs_inplace_write_data+0xd2/0x240
[ 699.755271] f2fs_do_write_data_page+0x2e2/0xe00
[ 699.756214] ? f2fs_should_update_outplace+0xd0/0xd0
[ 699.757215] ? memcg_drain_all_list_lrus+0x280/0x280
[ 699.758209] ? __radix_tree_replace+0xa3/0x120
[ 699.759164] __write_data_page+0x5c7/0xe30
[ 699.760002] ? kasan_check_read+0x11/0x20
[ 699.760823] ? page_mapped+0x8a/0x110
[ 699.761573] ? page_mkclean+0xe9/0x160
[ 699.762345] ? f2fs_do_write_data_page+0xe00/0xe00
[ 699.763332] ? invalid_page_referenced_vma+0x130/0x130
[ 699.764374] ? clear_page_dirty_for_io+0x332/0x450
[ 699.765347] f2fs_write_cache_pages+0x4ca/0x860
[ 699.766276] ? __write_data_page+0xe30/0xe30
[ 699.767161] ? percpu_counter_add_batch+0x22/0xa0
[ 699.768112] ? kasan_check_write+0x14/0x20
[ 699.768951] ? _raw_spin_lock+0x17/0x40
[ 699.769739] ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
[ 699.770885] ? iov_iter_advance+0x113/0x640
[ 699.771743] ? f2fs_write_end+0x133/0x2e0
[ 699.772569] ? balance_dirty_pages_ratelimited+0x239/0x640
[ 699.773680] f2fs_write_data_pages+0x329/0x520
[ 699.774603] ? generic_perform_write+0x250/0x320
[ 699.775544] ? f2fs_write_cache_pages+0x860/0x860
[ 699.776510] ? current_time+0x110/0x110
[ 699.777299] ? f2fs_preallocate_blocks+0x1ef/0x370
[ 699.778279] do_writepages+0x37/0xb0
[ 699.779026] ? f2fs_write_cache_pages+0x860/0x860
[ 699.779978] ? do_writepages+0x37/0xb0
[ 699.780755] __filemap_fdatawrite_range+0x19a/0x1f0
[ 699.781746] ? delete_from_page_cache_batch+0x4e0/0x4e0
[ 699.782820] ? __vfs_write+0x2b2/0x410
[ 699.783597] file_write_and_wait_range+0x66/0xb0
[ 699.784540] f2fs_do_sync_file+0x1f9/0xd90
[ 699.785381] ? truncate_partial_data_page+0x290/0x290
[ 699.786415] ? __sb_end_write+0x30/0x50
[ 699.787204] ? vfs_write+0x20f/0x260
[ 699.787941] f2fs_sync_file+0x9a/0xb0
[ 699.788694] ? f2fs_do_sync_file+0xd90/0xd90
[ 699.789572] vfs_fsync_range+0x68/0x100
[ 699.790360] ? __fget_light+0xc9/0xe0
[ 699.791128] do_fsync+0x3d/0x70
[ 699.791779] __x64_sys_fdatasync+0x24/0x30
[ 699.792614] do_syscall_64+0x78/0x170
[ 699.793371] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 699.794406] RIP: 0033:0x7f9bf930d800
[ 699.795134] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
[ 699.798960] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[ 699.800483] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
[ 699.801923] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
[ 699.803373] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
[ 699.804798] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
[ 699.806233] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
[ 699.807667] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 699.817079] ---[ end trace 4ce02f25ff7d3df6 ]---
[ 699.818068] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
[ 699.819114] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
[ 699.822919] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
[ 699.823977] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
[ 699.825436] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
[ 699.826881] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
[ 699.828292] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
[ 699.829750] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
[ 699.831192] FS: 00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[ 699.832793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 699.833981] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
[ 699.835556] ==================================================================
[ 699.837029] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0
[ 699.838462] Read of size 8 at addr ffff8801f43af970 by task a.out/1309
[ 699.840086] CPU: 0 PID: 1309 Comm: a.out Tainted: G D W 4.18.0-rc1+ #4
[ 699.841603] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 699.843475] Call Trace:
[ 699.843982] dump_stack+0x7b/0xb5
[ 699.844661] print_address_description+0x70/0x290
[ 699.845607] kasan_report+0x291/0x390
[ 699.846351] ? update_stack_state+0x38c/0x3e0
[ 699.853831] __asan_load8+0x54/0x90
[ 699.854569] update_stack_state+0x38c/0x3e0
[ 699.855428] ? __read_once_size_nocheck.constprop.7+0x20/0x20
[ 699.856601] ? __save_stack_trace+0x5e/0x100
[ 699.857476] unwind_next_frame.part.5+0x18e/0x490
[ 699.858448] ? unwind_dump+0x290/0x290
[ 699.859217] ? clear_page_dirty_for_io+0x332/0x450
[ 699.860185] __unwind_start+0x106/0x190
[ 699.860974] __save_stack_trace+0x5e/0x100
[ 699.861808] ? __save_stack_trace+0x5e/0x100
[ 699.862691] ? unlink_anon_vmas+0xba/0x2c0
[ 699.863525] save_stack_trace+0x1f/0x30
[ 699.864312] save_stack+0x46/0xd0
[ 699.864993] ? __alloc_pages_slowpath+0x1420/0x1420
[ 699.865990] ? flush_tlb_mm_range+0x15e/0x220
[ 699.866889] ? kasan_check_write+0x14/0x20
[ 699.867724] ? __dec_node_state+0x92/0xb0
[ 699.868543] ? lock_page_memcg+0x85/0xf0
[ 699.869350] ? unlock_page_memcg+0x16/0x80
[ 699.870185] ? page_remove_rmap+0x198/0x520
[ 699.871048] ? mark_page_accessed+0x133/0x200
[ 699.871930] ? _cond_resched+0x1a/0x50
[ 699.872700] ? unmap_page_range+0xcd4/0xe50
[ 699.873551] ? rb_next+0x58/0x80
[ 699.874217] ? rb_next+0x58/0x80
[ 699.874895] __kasan_slab_free+0x13c/0x1a0
[ 699.875734] ? unlink_anon_vmas+0xba/0x2c0
[ 699.876563] kasan_slab_free+0xe/0x10
[ 699.877315] kmem_cache_free+0x89/0x1e0
[ 699.878095] unlink_anon_vmas+0xba/0x2c0
[ 699.878913] free_pgtables+0x101/0x1b0
[ 699.879677] exit_mmap+0x146/0x2a0
[ 699.880378] ? __ia32_sys_munmap+0x50/0x50
[ 699.881214] ? kasan_check_read+0x11/0x20
[ 699.882052] ? mm_update_next_owner+0x322/0x380
[ 699.882985] mmput+0x8b/0x1d0
[ 699.883602] do_exit+0x43a/0x1390
[ 699.884288] ? mm_update_next_owner+0x380/0x380
[ 699.885212] ? f2fs_sync_file+0x9a/0xb0
[ 699.885995] ? f2fs_do_sync_file+0xd90/0xd90
[ 699.886877] ? vfs_fsync_range+0x68/0x100
[ 699.887694] ? __fget_light+0xc9/0xe0
[ 699.888442] ? do_fsync+0x3d/0x70
[ 699.889118] ? __x64_sys_fdatasync+0x24/0x30
[ 699.889996] rewind_stack_do_exit+0x17/0x20
[ 699.890860] RIP: 0033:0x7f9bf930d800
[ 699.891585] Code: Bad RIP value.
[ 699.892268] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[ 699.893781] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
[ 699.895220] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
[ 699.896643] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
[ 699.898069] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
[ 699.899505] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
[ 699.901241] The buggy address belongs to the page:
[ 699.902215] page:ffffea0007d0ebc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 699.903811] flags: 0x2ffff0000000000()
[ 699.904585] raw: 02ffff0000000000 0000000000000000 ffffffff07d00101 0000000000000000
[ 699.906125] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000
[ 699.907673] page dumped because: kasan: bad access detected
[ 699.909108] Memory state around the buggy address:
[ 699.910077] ffff8801f43af800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00
[ 699.911528] ffff8801f43af880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 699.912953] >ffff8801f43af900: 00 00 00 00 00 00 00 00 f1 01 f4 f4 f4 f2 f2 f2
[ 699.914392] ^
[ 699.915758] ffff8801f43af980: f2 00 f4 f4 00 00 00 00 f2 00 00 00 00 00 00 00
[ 699.917193] ffff8801f43afa00: 00 00 00 00 00 00 00 00 00 f3 f3 f3 00 00 00 00
[ 699.918634] ==================================================================
- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.h#L644
Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 19:13:44 +08:00
|
|
|
|
2018-06-05 17:44:11 +08:00
|
|
|
bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi,
|
|
|
|
block_t blkaddr, int type);
|
|
|
|
static inline void verify_blkaddr(struct f2fs_sb_info *sbi,
|
|
|
|
block_t blkaddr, int type)
|
|
|
|
{
|
|
|
|
if (!f2fs_is_valid_blkaddr(sbi, blkaddr, type)) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_err(sbi, "invalid blkaddr: %u, type: %d, run fsck to fix.",
|
|
|
|
blkaddr, type);
|
2018-06-05 17:44:11 +08:00
|
|
|
f2fs_bug_on(sbi, 1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool __is_valid_data_blkaddr(block_t blkaddr)
|
2018-05-23 22:25:08 +08:00
|
|
|
{
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR ||
|
|
|
|
blkaddr == COMPRESS_ADDR)
|
2018-05-23 22:25:08 +08:00
|
|
|
return false;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
/*
|
|
|
|
* file.c
|
|
|
|
*/
|
2017-01-31 02:55:18 +08:00
|
|
|
int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_truncate_data_blocks(struct dnode_of_data *dn);
|
2020-03-18 16:22:59 +08:00
|
|
|
int f2fs_do_truncate_blocks(struct inode *inode, u64 from, bool lock);
|
2019-02-02 17:33:01 +08:00
|
|
|
int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock);
|
2017-01-31 02:55:18 +08:00
|
|
|
int f2fs_truncate(struct inode *inode);
|
2021-01-21 21:19:43 +08:00
|
|
|
int f2fs_getattr(struct user_namespace *mnt_userns, const struct path *path,
|
|
|
|
struct kstat *stat, u32 request_mask, unsigned int flags);
|
|
|
|
int f2fs_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
|
|
|
|
struct iattr *attr);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end);
|
|
|
|
void f2fs_truncate_data_blocks_range(struct dnode_of_data *dn, int count);
|
2018-01-11 14:42:30 +08:00
|
|
|
int f2fs_precache_extents(struct inode *inode);
|
2021-04-07 20:36:43 +08:00
|
|
|
int f2fs_fileattr_get(struct dentry *dentry, struct fileattr *fa);
|
|
|
|
int f2fs_fileattr_set(struct user_namespace *mnt_userns,
|
|
|
|
struct dentry *dentry, struct fileattr *fa);
|
2017-01-31 02:55:18 +08:00
|
|
|
long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
|
|
|
|
long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
|
2018-09-25 15:36:02 +08:00
|
|
|
int f2fs_transfer_project_quota(struct inode *inode, kprojid_t kprojid);
|
2017-12-08 08:25:39 +08:00
|
|
|
int f2fs_pin_file_control(struct inode *inode, bool inc);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* inode.c
|
|
|
|
*/
|
2017-01-31 02:55:18 +08:00
|
|
|
void f2fs_set_inode_flags(struct inode *inode);
|
2017-07-31 20:19:09 +08:00
|
|
|
bool f2fs_inode_chksum_verify(struct f2fs_sb_info *sbi, struct page *page);
|
|
|
|
void f2fs_inode_chksum_set(struct f2fs_sb_info *sbi, struct page *page);
|
2017-01-31 02:55:18 +08:00
|
|
|
struct inode *f2fs_iget(struct super_block *sb, unsigned long ino);
|
|
|
|
struct inode *f2fs_iget_retry(struct super_block *sb, unsigned long ino);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink);
|
|
|
|
void f2fs_update_inode(struct inode *inode, struct page *node_page);
|
|
|
|
void f2fs_update_inode_page(struct inode *inode);
|
2017-01-31 02:55:18 +08:00
|
|
|
int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc);
|
|
|
|
void f2fs_evict_inode(struct inode *inode);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_handle_failed_inode(struct inode *inode);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* namei.c
|
|
|
|
*/
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_update_extension_list(struct f2fs_sb_info *sbi, const char *name,
|
2018-02-28 17:07:27 +08:00
|
|
|
bool hot, bool set);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
struct dentry *f2fs_get_parent(struct dentry *child);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* dir.c
|
|
|
|
*/
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
unsigned char f2fs_get_de_type(struct f2fs_dir_entry *de);
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 15:59:04 +08:00
|
|
|
int f2fs_init_casefolded_name(const struct inode *dir,
|
|
|
|
struct f2fs_filename *fname);
|
|
|
|
int f2fs_setup_filename(struct inode *dir, const struct qstr *iname,
|
|
|
|
int lookup, struct f2fs_filename *fname);
|
|
|
|
int f2fs_prepare_lookup(struct inode *dir, struct dentry *dentry,
|
|
|
|
struct f2fs_filename *fname);
|
|
|
|
void f2fs_free_filename(struct f2fs_filename *fname);
|
|
|
|
struct f2fs_dir_entry *f2fs_find_target_dentry(const struct f2fs_dentry_ptr *d,
|
|
|
|
const struct f2fs_filename *fname, int *max_slots);
|
2017-01-31 02:55:18 +08:00
|
|
|
int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d,
|
|
|
|
unsigned int start_pos, struct fscrypt_str *fstr);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_do_make_empty_dir(struct inode *inode, struct inode *parent,
|
2017-01-31 02:55:18 +08:00
|
|
|
struct f2fs_dentry_ptr *d);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
struct page *f2fs_init_inode_metadata(struct inode *inode, struct inode *dir,
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 15:59:04 +08:00
|
|
|
const struct f2fs_filename *fname, struct page *dpage);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_update_parent_metadata(struct inode *dir, struct inode *inode,
|
2017-01-31 02:55:18 +08:00
|
|
|
unsigned int current_depth);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_room_for_filename(const void *bitmap, int slots, int max_slots);
|
2017-01-31 02:55:18 +08:00
|
|
|
void f2fs_drop_nlink(struct inode *dir, struct inode *inode);
|
|
|
|
struct f2fs_dir_entry *__f2fs_find_entry(struct inode *dir,
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 15:59:04 +08:00
|
|
|
const struct f2fs_filename *fname,
|
|
|
|
struct page **res_page);
|
2017-01-31 02:55:18 +08:00
|
|
|
struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir,
|
|
|
|
const struct qstr *child, struct page **res_page);
|
|
|
|
struct f2fs_dir_entry *f2fs_parent_dir(struct inode *dir, struct page **p);
|
|
|
|
ino_t f2fs_inode_by_name(struct inode *dir, const struct qstr *qstr,
|
|
|
|
struct page **page);
|
|
|
|
void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de,
|
|
|
|
struct page *page, struct inode *inode);
|
2019-12-10 11:03:05 +08:00
|
|
|
bool f2fs_has_enough_room(struct inode *dir, struct page *ipage,
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 15:59:04 +08:00
|
|
|
const struct f2fs_filename *fname);
|
2017-01-31 02:55:18 +08:00
|
|
|
void f2fs_update_dentry(nid_t ino, umode_t mode, struct f2fs_dentry_ptr *d,
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 15:59:04 +08:00
|
|
|
const struct fscrypt_str *name, f2fs_hash_t name_hash,
|
2017-01-31 02:55:18 +08:00
|
|
|
unsigned int bit_pos);
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 15:59:04 +08:00
|
|
|
int f2fs_add_regular_entry(struct inode *dir, const struct f2fs_filename *fname,
|
2017-01-31 02:55:18 +08:00
|
|
|
struct inode *inode, nid_t ino, umode_t mode);
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 15:59:04 +08:00
|
|
|
int f2fs_add_dentry(struct inode *dir, const struct f2fs_filename *fname,
|
2017-01-31 02:55:18 +08:00
|
|
|
struct inode *inode, nid_t ino, umode_t mode);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_do_add_link(struct inode *dir, const struct qstr *name,
|
2017-01-31 02:55:18 +08:00
|
|
|
struct inode *inode, nid_t ino, umode_t mode);
|
|
|
|
void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page,
|
|
|
|
struct inode *dir, struct inode *inode);
|
|
|
|
int f2fs_do_tmpfile(struct inode *inode, struct inode *dir);
|
|
|
|
bool f2fs_empty_dir(struct inode *dir);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2013-01-26 05:15:43 +08:00
|
|
|
static inline int f2fs_add_link(struct dentry *dentry, struct inode *inode)
|
|
|
|
{
|
2020-11-18 15:56:07 +08:00
|
|
|
if (fscrypt_is_nokey_name(dentry))
|
|
|
|
return -ENOKEY;
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
return f2fs_do_add_link(d_inode(dentry->d_parent), &dentry->d_name,
|
2015-03-31 06:07:16 +08:00
|
|
|
inode, inode->i_ino, inode->i_mode);
|
2013-01-26 05:15:43 +08:00
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
/*
|
|
|
|
* super.c
|
|
|
|
*/
|
2017-01-31 02:55:18 +08:00
|
|
|
int f2fs_inode_dirtied(struct inode *inode, bool sync);
|
|
|
|
void f2fs_inode_synced(struct inode *inode);
|
2021-10-28 21:03:05 +08:00
|
|
|
int f2fs_dquot_initialize(struct inode *inode);
|
2017-10-07 00:14:28 +08:00
|
|
|
int f2fs_enable_quota_files(struct f2fs_sb_info *sbi, bool rdonly);
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 20:05:00 +08:00
|
|
|
int f2fs_quota_sync(struct super_block *sb, int type);
|
2021-01-13 13:21:54 +08:00
|
|
|
loff_t max_file_blocks(struct inode *inode);
|
2017-08-08 10:54:31 +08:00
|
|
|
void f2fs_quota_off_umount(struct super_block *sb);
|
2017-01-31 02:55:18 +08:00
|
|
|
int f2fs_commit_super(struct f2fs_sb_info *sbi, bool recover);
|
|
|
|
int f2fs_sync_fs(struct super_block *sb, int sync);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_sanity_check_ckpt(struct f2fs_sb_info *sbi);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* hash.c
|
|
|
|
*/
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 15:59:04 +08:00
|
|
|
void f2fs_hash_filename(const struct inode *dir, struct f2fs_filename *fname);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* node.c
|
|
|
|
*/
|
|
|
|
struct node_info;
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_check_nid_range(struct f2fs_sb_info *sbi, nid_t nid);
|
|
|
|
bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type);
|
f2fs: fix to avoid broken of dnode block list
f2fs recovery flow is relying on dnode block link list, it means fsynced
file recovery depends on previous dnode's persistence in the list, so
during fsync() we should wait on all regular inode's dnode writebacked
before issuing flush.
By this way, we can avoid dnode block list being broken by out-of-order
IO submission due to IO scheduler or driver.
Sheng Yong helps to do the test with this patch:
Target:/data (f2fs, -)
64MB / 32768KB / 4KB / 8
1 / PERSIST / Index
Base:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 867.82 204.15 41440.03 41370.54 680.8 1025.94 1031.08
2 871.87 205.87 41370.3 40275.2 791.14 1065.84 1101.7
3 866.52 205.69 41795.67 40596.16 694.69 1037.16 1031.48
Avg 868.7366667 205.2366667 41535.33333 40747.3 722.21 1042.98 1054.753333
After:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 798.81 202.5 41143 40613.87 602.71 838.08 913.83
2 805.79 206.47 40297.2 41291.46 604.44 840.75 924.27
3 814.83 206.17 41209.57 40453.62 602.85 834.66 927.91
Avg 806.4766667 205.0466667 40883.25667 40786.31667 603.3333333 837.83 922.0033333
Patched/Original:
0.928332713 0.999074239 0.984300676 1.000957528 0.835398753 0.803303994 0.874141189
It looks like atomic write will suffer performance regression.
I suspect that the criminal is that we forcing to wait all dnode being in
storage cache before we issue PREFLUSH+FUA.
BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
cause the problem: we will lose data of last transaction after SPO, even if
atomic write return no error:
- atomic_open();
- write() P1, P2, P3;
- atomic_commit();
- writeback data: P1, P2, P3;
- writeback node: N1, N2, N3; <--- If N1, N2 is not writebacked, N3 with fsync_mark is
writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
last transaction.
- preflush + fua;
- power-cut
If we don't wait dnode writeback for atomic_write:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 779.91 206.03 41621.5 40333.16 716.9 1038.21 1034.85
2 848.51 204.35 40082.44 39486.17 791.83 1119.96 1083.77
3 772.12 206.27 41335.25 41599.65 723.29 1055.07 971.92
Avg 800.18 205.55 41013.06333 40472.99333 744.0066667 1071.08 1030.18
Patched/Original:
0.92108464 1.001526693 0.987425886 0.993268102 1.030180511 1.026942031 0.976702294
SQLite's performance recovers.
Jaegeuk:
"Practically, I don't see db corruption becase of this. We can excuse to lose
the last transaction."
Finally, we decide to keep original implementation of atomic write interface
sematics that we don't wait all dnode writeback before preflush+fua submission.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-02 23:03:19 +08:00
|
|
|
bool f2fs_in_warm_node_list(struct f2fs_sb_info *sbi, struct page *page);
|
|
|
|
void f2fs_init_fsync_node_info(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_del_fsync_node_entry(struct f2fs_sb_info *sbi, struct page *page);
|
|
|
|
void f2fs_reset_fsync_node_info(struct f2fs_sb_info *sbi);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid);
|
|
|
|
bool f2fs_is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid);
|
|
|
|
bool f2fs_need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino);
|
2018-07-17 00:02:17 +08:00
|
|
|
int f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid,
|
2021-12-14 06:16:32 +08:00
|
|
|
struct node_info *ni, bool checkpoint_context);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
pgoff_t f2fs_get_next_page_offset(struct dnode_of_data *dn, pgoff_t pgofs);
|
|
|
|
int f2fs_get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode);
|
|
|
|
int f2fs_truncate_inode_blocks(struct inode *inode, pgoff_t from);
|
|
|
|
int f2fs_truncate_xattr_node(struct inode *inode);
|
f2fs: fix to avoid broken of dnode block list
f2fs recovery flow is relying on dnode block link list, it means fsynced
file recovery depends on previous dnode's persistence in the list, so
during fsync() we should wait on all regular inode's dnode writebacked
before issuing flush.
By this way, we can avoid dnode block list being broken by out-of-order
IO submission due to IO scheduler or driver.
Sheng Yong helps to do the test with this patch:
Target:/data (f2fs, -)
64MB / 32768KB / 4KB / 8
1 / PERSIST / Index
Base:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 867.82 204.15 41440.03 41370.54 680.8 1025.94 1031.08
2 871.87 205.87 41370.3 40275.2 791.14 1065.84 1101.7
3 866.52 205.69 41795.67 40596.16 694.69 1037.16 1031.48
Avg 868.7366667 205.2366667 41535.33333 40747.3 722.21 1042.98 1054.753333
After:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 798.81 202.5 41143 40613.87 602.71 838.08 913.83
2 805.79 206.47 40297.2 41291.46 604.44 840.75 924.27
3 814.83 206.17 41209.57 40453.62 602.85 834.66 927.91
Avg 806.4766667 205.0466667 40883.25667 40786.31667 603.3333333 837.83 922.0033333
Patched/Original:
0.928332713 0.999074239 0.984300676 1.000957528 0.835398753 0.803303994 0.874141189
It looks like atomic write will suffer performance regression.
I suspect that the criminal is that we forcing to wait all dnode being in
storage cache before we issue PREFLUSH+FUA.
BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
cause the problem: we will lose data of last transaction after SPO, even if
atomic write return no error:
- atomic_open();
- write() P1, P2, P3;
- atomic_commit();
- writeback data: P1, P2, P3;
- writeback node: N1, N2, N3; <--- If N1, N2 is not writebacked, N3 with fsync_mark is
writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
last transaction.
- preflush + fua;
- power-cut
If we don't wait dnode writeback for atomic_write:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 779.91 206.03 41621.5 40333.16 716.9 1038.21 1034.85
2 848.51 204.35 40082.44 39486.17 791.83 1119.96 1083.77
3 772.12 206.27 41335.25 41599.65 723.29 1055.07 971.92
Avg 800.18 205.55 41013.06333 40472.99333 744.0066667 1071.08 1030.18
Patched/Original:
0.92108464 1.001526693 0.987425886 0.993268102 1.030180511 1.026942031 0.976702294
SQLite's performance recovers.
Jaegeuk:
"Practically, I don't see db corruption becase of this. We can excuse to lose
the last transaction."
Finally, we decide to keep original implementation of atomic write interface
sematics that we don't wait all dnode writeback before preflush+fua submission.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-02 23:03:19 +08:00
|
|
|
int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int seq_id);
|
2021-08-20 18:54:59 +08:00
|
|
|
bool f2fs_nat_bitmap_enabled(struct f2fs_sb_info *sbi);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_remove_inode_page(struct inode *inode);
|
|
|
|
struct page *f2fs_new_inode_page(struct inode *inode);
|
|
|
|
struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs);
|
|
|
|
void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid);
|
|
|
|
struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid);
|
|
|
|
struct page *f2fs_get_node_page_ra(struct page *parent, int start);
|
2018-09-13 07:40:53 +08:00
|
|
|
int f2fs_move_node_page(struct page *node_page, int gc_type);
|
2020-07-21 11:49:14 +08:00
|
|
|
void f2fs_flush_inline_data(struct f2fs_sb_info *sbi);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
|
f2fs: fix to avoid broken of dnode block list
f2fs recovery flow is relying on dnode block link list, it means fsynced
file recovery depends on previous dnode's persistence in the list, so
during fsync() we should wait on all regular inode's dnode writebacked
before issuing flush.
By this way, we can avoid dnode block list being broken by out-of-order
IO submission due to IO scheduler or driver.
Sheng Yong helps to do the test with this patch:
Target:/data (f2fs, -)
64MB / 32768KB / 4KB / 8
1 / PERSIST / Index
Base:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 867.82 204.15 41440.03 41370.54 680.8 1025.94 1031.08
2 871.87 205.87 41370.3 40275.2 791.14 1065.84 1101.7
3 866.52 205.69 41795.67 40596.16 694.69 1037.16 1031.48
Avg 868.7366667 205.2366667 41535.33333 40747.3 722.21 1042.98 1054.753333
After:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 798.81 202.5 41143 40613.87 602.71 838.08 913.83
2 805.79 206.47 40297.2 41291.46 604.44 840.75 924.27
3 814.83 206.17 41209.57 40453.62 602.85 834.66 927.91
Avg 806.4766667 205.0466667 40883.25667 40786.31667 603.3333333 837.83 922.0033333
Patched/Original:
0.928332713 0.999074239 0.984300676 1.000957528 0.835398753 0.803303994 0.874141189
It looks like atomic write will suffer performance regression.
I suspect that the criminal is that we forcing to wait all dnode being in
storage cache before we issue PREFLUSH+FUA.
BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
cause the problem: we will lose data of last transaction after SPO, even if
atomic write return no error:
- atomic_open();
- write() P1, P2, P3;
- atomic_commit();
- writeback data: P1, P2, P3;
- writeback node: N1, N2, N3; <--- If N1, N2 is not writebacked, N3 with fsync_mark is
writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
last transaction.
- preflush + fua;
- power-cut
If we don't wait dnode writeback for atomic_write:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 779.91 206.03 41621.5 40333.16 716.9 1038.21 1034.85
2 848.51 204.35 40082.44 39486.17 791.83 1119.96 1083.77
3 772.12 206.27 41335.25 41599.65 723.29 1055.07 971.92
Avg 800.18 205.55 41013.06333 40472.99333 744.0066667 1071.08 1030.18
Patched/Original:
0.92108464 1.001526693 0.987425886 0.993268102 1.030180511 1.026942031 0.976702294
SQLite's performance recovers.
Jaegeuk:
"Practically, I don't see db corruption becase of this. We can excuse to lose
the last transaction."
Finally, we decide to keep original implementation of atomic write interface
sematics that we don't wait all dnode writeback before preflush+fua submission.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-02 23:03:19 +08:00
|
|
|
struct writeback_control *wbc, bool atomic,
|
|
|
|
unsigned int *seq_id);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
|
|
|
|
struct writeback_control *wbc,
|
2017-08-02 23:21:48 +08:00
|
|
|
bool do_balance, enum iostat_type io_type);
|
2018-06-15 14:45:57 +08:00
|
|
|
int f2fs_build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
bool f2fs_alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid);
|
|
|
|
void f2fs_alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid);
|
|
|
|
void f2fs_alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid);
|
|
|
|
int f2fs_try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink);
|
2020-07-06 18:23:36 +08:00
|
|
|
int f2fs_recover_inline_xattr(struct inode *inode, struct page *page);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_recover_xattr_data(struct inode *inode, struct page *page);
|
|
|
|
int f2fs_recover_inode_page(struct f2fs_sb_info *sbi, struct page *page);
|
2018-07-17 00:02:17 +08:00
|
|
|
int f2fs_restore_node_summary(struct f2fs_sb_info *sbi,
|
2017-01-31 02:55:18 +08:00
|
|
|
unsigned int segno, struct f2fs_summary_block *sum);
|
2021-08-20 18:54:59 +08:00
|
|
|
void f2fs_enable_nat_bits(struct f2fs_sb_info *sbi);
|
2018-09-18 08:36:06 +08:00
|
|
|
int f2fs_flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_build_node_manager(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_destroy_node_manager(struct f2fs_sb_info *sbi);
|
|
|
|
int __init f2fs_create_node_manager_caches(void);
|
|
|
|
void f2fs_destroy_node_manager_caches(void);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* segment.c
|
|
|
|
*/
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
bool f2fs_need_SSR(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_register_inmem_page(struct inode *inode, struct page *page);
|
|
|
|
void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure);
|
|
|
|
void f2fs_drop_inmem_pages(struct inode *inode);
|
|
|
|
void f2fs_drop_inmem_page(struct inode *inode, struct page *page);
|
|
|
|
int f2fs_commit_inmem_pages(struct inode *inode);
|
2017-01-31 02:55:18 +08:00
|
|
|
void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need);
|
2020-03-19 19:57:58 +08:00
|
|
|
void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi, bool from_bg);
|
2017-09-29 13:59:38 +08:00
|
|
|
int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_create_flush_cmd_control(struct f2fs_sb_info *sbi);
|
2017-09-29 13:59:39 +08:00
|
|
|
int f2fs_flush_device_cache(struct f2fs_sb_info *sbi);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free);
|
|
|
|
void f2fs_invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr);
|
|
|
|
bool f2fs_is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr);
|
2021-08-19 16:02:37 +08:00
|
|
|
int f2fs_start_discard_thread(struct f2fs_sb_info *sbi);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_drop_discard_cmd(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_stop_discard_thread(struct f2fs_sb_info *sbi);
|
2019-01-15 02:42:11 +08:00
|
|
|
bool f2fs_issue_discard_timeout(struct f2fs_sb_info *sbi);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
|
|
|
|
struct cp_control *cpc);
|
2018-08-21 10:21:43 +08:00
|
|
|
void f2fs_dirty_to_prefree(struct f2fs_sb_info *sbi);
|
2019-05-30 08:49:06 +08:00
|
|
|
block_t f2fs_get_unusable_blocks(struct f2fs_sb_info *sbi);
|
|
|
|
int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi);
|
|
|
|
int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
|
f2fs: fix to avoid touching checkpointed data in get_victim()
In CP disabling mode, there are two issues when using LFS or SSR | AT_SSR
mode to select victim:
1. LFS is set to find source section during GC, the victim should have
no checkpointed data, since after GC, section could not be set free for
reuse.
Previously, we only check valid chpt blocks in current segment rather
than section, fix it.
2. SSR | AT_SSR are set to find target segment for writes which can be
fully filled by checkpointed and newly written blocks, we should never
select such segment, otherwise it can cause panic or data corruption
during allocation, potential case is described as below:
a) target segment has 'n' (n < 512) ckpt valid blocks
b) GC migrates 'n' valid blocks to other segment (segment is still
in dirty list)
c) GC migrates '512 - n' blocks to target segment (segment has 'n'
cp_vblocks and '512 - n' vblocks)
d) If GC selects target segment via {AT,}SSR allocator, however there
is no free space in targe segment.
Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-03-24 11:18:28 +08:00
|
|
|
bool f2fs_segment_has_free_slot(struct f2fs_sb_info *sbi, int segno);
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_get_new_segment(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int *newseg, bool new_sec, int dir);
|
2020-06-18 14:36:22 +08:00
|
|
|
void f2fs_allocate_segment_for_resize(struct f2fs_sb_info *sbi, int type,
|
2019-06-05 11:33:25 +08:00
|
|
|
unsigned int start, unsigned int end);
|
2021-04-21 09:54:55 +08:00
|
|
|
void f2fs_allocate_new_section(struct f2fs_sb_info *sbi, int type, bool force);
|
2020-06-22 17:38:48 +08:00
|
|
|
void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi);
|
2017-01-31 02:55:18 +08:00
|
|
|
int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
bool f2fs_exist_trim_candidates(struct f2fs_sb_info *sbi,
|
|
|
|
struct cp_control *cpc);
|
|
|
|
struct page *f2fs_get_sum_page(struct f2fs_sb_info *sbi, unsigned int segno);
|
|
|
|
void f2fs_update_meta_page(struct f2fs_sb_info *sbi, void *src,
|
|
|
|
block_t blk_addr);
|
|
|
|
void f2fs_do_write_meta_page(struct f2fs_sb_info *sbi, struct page *page,
|
2017-08-02 23:21:48 +08:00
|
|
|
enum iostat_type io_type);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_do_write_node_page(unsigned int nid, struct f2fs_io_info *fio);
|
|
|
|
void f2fs_outplace_write_data(struct dnode_of_data *dn,
|
|
|
|
struct f2fs_io_info *fio);
|
|
|
|
int f2fs_inplace_write_data(struct f2fs_io_info *fio);
|
|
|
|
void f2fs_do_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
|
2017-01-31 02:55:18 +08:00
|
|
|
block_t old_blkaddr, block_t new_blkaddr,
|
2020-08-04 21:14:47 +08:00
|
|
|
bool recover_curseg, bool recover_newaddr,
|
|
|
|
bool from_gc);
|
2017-01-31 02:55:18 +08:00
|
|
|
void f2fs_replace_block(struct f2fs_sb_info *sbi, struct dnode_of_data *dn,
|
|
|
|
block_t old_addr, block_t new_addr,
|
|
|
|
unsigned char version, bool recover_curseg,
|
|
|
|
bool recover_newaddr);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
|
2017-01-31 02:55:18 +08:00
|
|
|
block_t old_blkaddr, block_t *new_blkaddr,
|
2017-05-19 23:37:01 +08:00
|
|
|
struct f2fs_summary *sum, int type,
|
2020-06-18 14:36:24 +08:00
|
|
|
struct f2fs_io_info *fio);
|
2021-09-01 14:39:20 +08:00
|
|
|
void f2fs_update_device_state(struct f2fs_sb_info *sbi, nid_t ino,
|
|
|
|
block_t blkaddr, unsigned int blkcnt);
|
2017-01-31 02:55:18 +08:00
|
|
|
void f2fs_wait_on_page_writeback(struct page *page,
|
2018-12-25 17:43:42 +08:00
|
|
|
enum page_type type, bool ordered, bool locked);
|
2018-08-23 12:18:00 +08:00
|
|
|
void f2fs_wait_on_block_writeback(struct inode *inode, block_t blkaddr);
|
2018-10-10 13:26:22 +08:00
|
|
|
void f2fs_wait_on_block_writeback_range(struct inode *inode, block_t blkaddr,
|
|
|
|
block_t len);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
|
|
|
|
void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
|
|
|
|
int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
|
2017-01-31 02:55:18 +08:00
|
|
|
unsigned int val, int alloc);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
|
2019-12-09 18:44:44 +08:00
|
|
|
int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi);
|
2019-12-09 18:44:45 +08:00
|
|
|
int f2fs_check_write_pointer(struct f2fs_sb_info *sbi);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
|
|
|
|
int __init f2fs_create_segment_manager_caches(void);
|
|
|
|
void f2fs_destroy_segment_manager_caches(void);
|
|
|
|
int f2fs_rw_hint_to_seg_type(enum rw_hint hint);
|
|
|
|
enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
|
|
|
|
enum page_type type, enum temp_type temp);
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int segno);
|
|
|
|
unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int segno);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2021-09-30 02:12:03 +08:00
|
|
|
#define DEF_FRAGMENT_SIZE 4
|
|
|
|
#define MIN_FRAGMENT_SIZE 1
|
|
|
|
#define MAX_FRAGMENT_SIZE 512
|
|
|
|
|
|
|
|
static inline bool f2fs_need_rand_seg(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return F2FS_OPTION(sbi).fs_mode == FS_MODE_FRAGMENT_SEG ||
|
|
|
|
F2FS_OPTION(sbi).fs_mode == FS_MODE_FRAGMENT_BLK;
|
|
|
|
}
|
|
|
|
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
/*
|
|
|
|
* checkpoint.c
|
|
|
|
*/
|
2017-01-31 02:55:18 +08:00
|
|
|
void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
struct page *f2fs_grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index);
|
|
|
|
struct page *f2fs_get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index);
|
2020-10-03 05:17:35 +08:00
|
|
|
struct page *f2fs_get_meta_page_retry(struct f2fs_sb_info *sbi, pgoff_t index);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
struct page *f2fs_get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index);
|
2018-06-05 17:44:11 +08:00
|
|
|
bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi,
|
|
|
|
block_t blkaddr, int type);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages,
|
2017-01-31 02:55:18 +08:00
|
|
|
int type, bool sync);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index);
|
|
|
|
long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
|
2017-08-02 23:21:48 +08:00
|
|
|
long nr_to_write, enum iostat_type io_type);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type);
|
|
|
|
void f2fs_remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type);
|
|
|
|
void f2fs_release_ino_entry(struct f2fs_sb_info *sbi, bool all);
|
|
|
|
bool f2fs_exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode);
|
|
|
|
void f2fs_set_dirty_device(struct f2fs_sb_info *sbi, nid_t ino,
|
2017-09-29 13:59:38 +08:00
|
|
|
unsigned int devidx, int type);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
bool f2fs_is_dirty_device(struct f2fs_sb_info *sbi, nid_t ino,
|
2017-09-29 13:59:38 +08:00
|
|
|
unsigned int devidx, int type);
|
2017-01-31 02:55:18 +08:00
|
|
|
int f2fs_sync_inode_meta(struct f2fs_sb_info *sbi);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_acquire_orphan_inode(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_release_orphan_inode(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_add_orphan_inode(struct inode *inode);
|
|
|
|
void f2fs_remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino);
|
|
|
|
int f2fs_recover_orphan_inodes(struct f2fs_sb_info *sbi);
|
|
|
|
int f2fs_get_valid_checkpoint(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_update_dirty_page(struct inode *inode, struct page *page);
|
|
|
|
void f2fs_remove_dirty_inode(struct inode *inode);
|
|
|
|
int f2fs_sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type);
|
2020-02-18 11:49:07 +08:00
|
|
|
void f2fs_wait_on_all_pages(struct f2fs_sb_info *sbi, int type);
|
2020-11-27 21:20:06 +08:00
|
|
|
u64 f2fs_get_sectors_written(struct f2fs_sb_info *sbi);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc);
|
|
|
|
void f2fs_init_ino_entry_info(struct f2fs_sb_info *sbi);
|
|
|
|
int __init f2fs_create_checkpoint_caches(void);
|
|
|
|
void f2fs_destroy_checkpoint_caches(void);
|
f2fs: introduce checkpoint_merge mount option
We've added a new mount options, "checkpoint_merge" and "nocheckpoint_merge",
which creates a kernel daemon and makes it to merge concurrent checkpoint
requests as much as possible to eliminate redundant checkpoint issues. Plus,
we can eliminate the sluggish issue caused by slow checkpoint operation
when the checkpoint is done in a process context in a cgroup having
low i/o budget and cpu shares. To make this do better, we set the
default i/o priority of the kernel daemon to "3", to give one higher
priority than other kernel threads. The below verification result
explains this.
The basic idea has come from https://opensource.samsung.com.
[Verification]
Android Pixel Device(ARM64, 7GB RAM, 256GB UFS)
Create two I/O cgroups (fg w/ weight 100, bg w/ wight 20)
Set "strict_guarantees" to "1" in BFQ tunables
In "fg" cgroup,
- thread A => trigger 1000 checkpoint operations
"for i in `seq 1 1000`; do touch test_dir1/file; fsync test_dir1;
done"
- thread B => gererating async. I/O
"fio --rw=write --numjobs=1 --bs=128k --runtime=3600 --time_based=1
--filename=test_img --name=test"
In "bg" cgroup,
- thread C => trigger repeated checkpoint operations
"echo $$ > /dev/blkio/bg/tasks; while true; do touch test_dir2/file;
fsync test_dir2; done"
We've measured thread A's execution time.
[ w/o patch ]
Elapsed Time: Avg. 68 seconds
[ w/ patch ]
Elapsed Time: Avg. 48 seconds
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
[Jaegeuk Kim: fix the return value in f2fs_start_ckpt_thread, reported by Dan]
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-19 08:00:42 +08:00
|
|
|
int f2fs_issue_checkpoint(struct f2fs_sb_info *sbi);
|
|
|
|
int f2fs_start_ckpt_thread(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_stop_ckpt_thread(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_init_ckpt_req_control(struct f2fs_sb_info *sbi);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* data.c
|
|
|
|
*/
|
2019-12-04 09:52:58 +08:00
|
|
|
int __init f2fs_init_bioset(void);
|
|
|
|
void f2fs_destroy_bioset(void);
|
2019-09-30 18:53:25 +08:00
|
|
|
int f2fs_init_bio_entry_cache(void);
|
|
|
|
void f2fs_destroy_bio_entry_cache(void);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
void f2fs_submit_bio(struct f2fs_sb_info *sbi,
|
|
|
|
struct bio *bio, enum page_type type);
|
2017-05-11 02:28:38 +08:00
|
|
|
void f2fs_submit_merged_write(struct f2fs_sb_info *sbi, enum page_type type);
|
|
|
|
void f2fs_submit_merged_write_cond(struct f2fs_sb_info *sbi,
|
2018-09-27 23:41:16 +08:00
|
|
|
struct inode *inode, struct page *page,
|
|
|
|
nid_t ino, enum page_type type);
|
2019-09-30 18:53:25 +08:00
|
|
|
void f2fs_submit_merged_ipu_write(struct f2fs_sb_info *sbi,
|
|
|
|
struct bio **bio, struct page *page);
|
2017-05-11 02:28:38 +08:00
|
|
|
void f2fs_flush_merged_writes(struct f2fs_sb_info *sbi);
|
2017-01-31 02:55:18 +08:00
|
|
|
int f2fs_submit_page_bio(struct f2fs_io_info *fio);
|
f2fs: add bio cache for IPU
SQLite in Wal mode may trigger sequential IPU write in db-wal file, after
commit d1b3e72d5490 ("f2fs: submit bio of in-place-update pages"), we
lost the chance of merging page in inner managed bio cache, result in
submitting more small-sized IO.
So let's add temporary bio in writepages() to cache mergeable write IO as
much as possible.
Test case:
1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
2. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
Before:
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65552, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65560, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65568, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65576, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65584, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65592, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65600, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65608, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65616, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65624, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65632, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65640, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65648, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65656, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65664, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57352, size = 4096
After:
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 65536
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57368, size = 4096
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-19 16:15:29 +08:00
|
|
|
int f2fs_merge_page_bio(struct f2fs_io_info *fio);
|
2018-05-28 23:47:18 +08:00
|
|
|
void f2fs_submit_page_write(struct f2fs_io_info *fio);
|
2017-01-31 02:55:18 +08:00
|
|
|
struct block_device *f2fs_target_device(struct f2fs_sb_info *sbi,
|
|
|
|
block_t blk_addr, struct bio *bio);
|
|
|
|
int f2fs_target_device_index(struct f2fs_sb_info *sbi, block_t blkaddr);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_set_data_blkaddr(struct dnode_of_data *dn);
|
2017-01-31 02:55:18 +08:00
|
|
|
void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count);
|
|
|
|
int f2fs_reserve_new_block(struct dnode_of_data *dn);
|
2017-01-31 02:55:18 +08:00
|
|
|
int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index);
|
|
|
|
int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
|
2017-01-31 02:55:18 +08:00
|
|
|
int op_flags, bool for_write);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
struct page *f2fs_find_data_page(struct inode *inode, pgoff_t index);
|
|
|
|
struct page *f2fs_get_lock_data_page(struct inode *inode, pgoff_t index,
|
2017-01-31 02:55:18 +08:00
|
|
|
bool for_write);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
struct page *f2fs_get_new_data_page(struct inode *inode,
|
2017-01-31 02:55:18 +08:00
|
|
|
struct page *ipage, pgoff_t index, bool new_i_size);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_do_write_data_page(struct f2fs_io_info *fio);
|
2020-06-18 14:36:22 +08:00
|
|
|
void f2fs_do_map_lock(struct f2fs_sb_info *sbi, int flag, bool lock);
|
2017-01-31 02:55:18 +08:00
|
|
|
int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
|
|
|
|
int create, int flag);
|
|
|
|
int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
|
|
|
|
u64 start, u64 len);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
int f2fs_encrypt_one_page(struct f2fs_io_info *fio);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
bool f2fs_should_update_inplace(struct inode *inode, struct f2fs_io_info *fio);
|
|
|
|
bool f2fs_should_update_outplace(struct inode *inode, struct f2fs_io_info *fio);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
int f2fs_write_single_data_page(struct page *page, int *submitted,
|
|
|
|
struct bio **bio, sector_t *last_block,
|
|
|
|
struct writeback_control *wbc,
|
|
|
|
enum iostat_type io_type,
|
2021-01-11 17:42:53 +08:00
|
|
|
int compr_blocks, bool allow_balance);
|
2021-07-23 15:59:21 +08:00
|
|
|
void f2fs_write_failed(struct inode *inode, loff_t to);
|
2017-01-31 02:55:18 +08:00
|
|
|
void f2fs_invalidate_page(struct page *page, unsigned int offset,
|
|
|
|
unsigned int length);
|
|
|
|
int f2fs_release_page(struct page *page, gfp_t wait);
|
2016-09-20 05:03:27 +08:00
|
|
|
#ifdef CONFIG_MIGRATION
|
2017-01-31 02:55:18 +08:00
|
|
|
int f2fs_migrate_page(struct address_space *mapping, struct page *newpage,
|
|
|
|
struct page *page, enum migrate_mode mode);
|
2016-09-20 05:03:27 +08:00
|
|
|
#endif
|
2018-03-08 18:34:38 +08:00
|
|
|
bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
|
2017-12-05 09:25:25 +08:00
|
|
|
void f2fs_clear_page_cache_dirty_tag(struct page *page);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
int f2fs_init_post_read_processing(void);
|
|
|
|
void f2fs_destroy_post_read_processing(void);
|
|
|
|
int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi);
|
2021-07-23 15:59:20 +08:00
|
|
|
extern const struct iomap_ops f2fs_iomap_ops;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* gc.c
|
|
|
|
*/
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_start_gc_thread(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_stop_gc_thread(struct f2fs_sb_info *sbi);
|
|
|
|
block_t f2fs_start_bidx_of_node(unsigned int node_ofs, struct inode *inode);
|
2021-02-20 17:35:40 +08:00
|
|
|
int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, bool background, bool force,
|
2017-04-14 06:17:00 +08:00
|
|
|
unsigned int segno);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_build_gc_manager(struct f2fs_sb_info *sbi);
|
2019-06-05 11:33:25 +08:00
|
|
|
int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count);
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
int __init f2fs_create_garbage_collection_cache(void);
|
|
|
|
void f2fs_destroy_garbage_collection_cache(void);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* recovery.c
|
|
|
|
*/
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only);
|
|
|
|
bool f2fs_space_for_roll_forward(struct f2fs_sb_info *sbi);
|
2021-05-07 18:10:38 +08:00
|
|
|
int __init f2fs_create_recovery_cache(void);
|
|
|
|
void f2fs_destroy_recovery_cache(void);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* debug.c
|
|
|
|
*/
|
|
|
|
#ifdef CONFIG_F2FS_STAT_FS
|
|
|
|
struct f2fs_stat_info {
|
|
|
|
struct list_head stat_list;
|
|
|
|
struct f2fs_sb_info *sbi;
|
|
|
|
int all_area_segs, sit_area_segs, nat_area_segs, ssa_area_segs;
|
|
|
|
int main_area_segs, main_area_sections, main_area_zones;
|
2015-09-30 17:38:48 +08:00
|
|
|
unsigned long long hit_largest, hit_cached, hit_rbtree;
|
|
|
|
unsigned long long hit_total, total_ext;
|
2016-01-01 07:24:14 +08:00
|
|
|
int ext_tree, zombie_tree, ext_node;
|
2017-11-14 09:46:38 +08:00
|
|
|
int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
|
|
|
|
int ndirty_data, ndirty_qdata;
|
2016-10-21 10:09:57 +08:00
|
|
|
int inmem_pages;
|
2017-11-14 09:46:38 +08:00
|
|
|
unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
|
2017-05-02 09:13:03 +08:00
|
|
|
int nats, dirty_nats, sits, dirty_sits;
|
|
|
|
int free_nids, avail_nids, alloc_nids;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
int total_count, utilization;
|
2017-03-25 17:19:58 +08:00
|
|
|
int bg_gc, nr_wb_cp_data, nr_wb_data;
|
2018-10-17 01:20:53 +08:00
|
|
|
int nr_rd_data, nr_rd_node, nr_rd_meta;
|
2018-11-12 00:46:46 +08:00
|
|
|
int nr_dio_read, nr_dio_write;
|
2018-09-29 18:31:28 +08:00
|
|
|
unsigned int io_skip_bggc, other_skip_bggc;
|
2017-09-14 10:18:01 +08:00
|
|
|
int nr_flushing, nr_flushed, flush_list_empty;
|
|
|
|
int nr_discarding, nr_discarded;
|
2017-03-25 17:19:59 +08:00
|
|
|
int nr_discard_cmd;
|
2017-04-18 19:27:39 +08:00
|
|
|
unsigned int undiscard_blks;
|
f2fs: introduce checkpoint_merge mount option
We've added a new mount options, "checkpoint_merge" and "nocheckpoint_merge",
which creates a kernel daemon and makes it to merge concurrent checkpoint
requests as much as possible to eliminate redundant checkpoint issues. Plus,
we can eliminate the sluggish issue caused by slow checkpoint operation
when the checkpoint is done in a process context in a cgroup having
low i/o budget and cpu shares. To make this do better, we set the
default i/o priority of the kernel daemon to "3", to give one higher
priority than other kernel threads. The below verification result
explains this.
The basic idea has come from https://opensource.samsung.com.
[Verification]
Android Pixel Device(ARM64, 7GB RAM, 256GB UFS)
Create two I/O cgroups (fg w/ weight 100, bg w/ wight 20)
Set "strict_guarantees" to "1" in BFQ tunables
In "fg" cgroup,
- thread A => trigger 1000 checkpoint operations
"for i in `seq 1 1000`; do touch test_dir1/file; fsync test_dir1;
done"
- thread B => gererating async. I/O
"fio --rw=write --numjobs=1 --bs=128k --runtime=3600 --time_based=1
--filename=test_img --name=test"
In "bg" cgroup,
- thread C => trigger repeated checkpoint operations
"echo $$ > /dev/blkio/bg/tasks; while true; do touch test_dir2/file;
fsync test_dir2; done"
We've measured thread A's execution time.
[ w/o patch ]
Elapsed Time: Avg. 68 seconds
[ w/ patch ]
Elapsed Time: Avg. 48 seconds
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
[Jaegeuk Kim: fix the return value in f2fs_start_ckpt_thread, reported by Dan]
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-19 08:00:42 +08:00
|
|
|
int nr_issued_ckpt, nr_total_ckpt, nr_queued_ckpt;
|
|
|
|
unsigned int cur_ckpt_time, peak_ckpt_time;
|
2017-02-02 07:40:11 +08:00
|
|
|
int inline_xattr, inline_inode, inline_dir, append, update, orphans;
|
2020-08-31 10:09:49 +08:00
|
|
|
int compr_inode;
|
|
|
|
unsigned long long compr_blocks;
|
2017-03-22 17:23:45 +08:00
|
|
|
int aw_cnt, max_aw_cnt, vw_cnt, max_vw_cnt;
|
2016-08-18 21:01:18 +08:00
|
|
|
unsigned int valid_count, valid_node_count, valid_inode_count, discard_blks;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
unsigned int bimodal, avg_vblocks;
|
|
|
|
int util_free, util_valid, util_invalid;
|
|
|
|
int rsvd_segs, overp_segs;
|
2021-05-20 19:51:50 +08:00
|
|
|
int dirty_count, node_pages, meta_pages, compress_pages;
|
|
|
|
int compress_page_hit;
|
2016-01-10 05:45:17 +08:00
|
|
|
int prefree_count, call_count, cp_count, bg_cp_count;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
int tot_segs, node_segs, data_segs, free_segs, free_secs;
|
2014-12-23 07:37:39 +08:00
|
|
|
int bg_node_segs, bg_data_segs;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
int tot_blks, data_blks, node_blks;
|
2014-12-23 07:37:39 +08:00
|
|
|
int bg_data_blks, bg_node_blks;
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
unsigned long long skipped_atomic_files[2];
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
int curseg[NR_CURSEG_TYPE];
|
|
|
|
int cursec[NR_CURSEG_TYPE];
|
|
|
|
int curzone[NR_CURSEG_TYPE];
|
2020-06-28 10:58:44 +08:00
|
|
|
unsigned int dirty_seg[NR_CURSEG_TYPE];
|
|
|
|
unsigned int full_seg[NR_CURSEG_TYPE];
|
|
|
|
unsigned int valid_blks[NR_CURSEG_TYPE];
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2018-09-29 18:31:27 +08:00
|
|
|
unsigned int meta_count[META_MAX];
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
unsigned int segment_count[2];
|
|
|
|
unsigned int block_count[2];
|
2014-12-24 01:16:54 +08:00
|
|
|
unsigned int inplace_count;
|
2015-09-11 14:43:52 +08:00
|
|
|
unsigned long long base_mem, cache_mem, page_mem;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
};
|
|
|
|
|
2013-07-12 14:47:11 +08:00
|
|
|
static inline struct f2fs_stat_info *F2FS_STAT(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2014-01-18 04:44:39 +08:00
|
|
|
return (struct f2fs_stat_info *)sbi->stat_info;
|
2013-07-12 14:47:11 +08:00
|
|
|
}
|
|
|
|
|
2014-02-13 14:12:29 +08:00
|
|
|
#define stat_inc_cp_count(si) ((si)->cp_count++)
|
2016-01-10 05:45:17 +08:00
|
|
|
#define stat_inc_bg_cp_count(si) ((si)->bg_cp_count++)
|
2013-10-22 19:56:10 +08:00
|
|
|
#define stat_inc_call_count(si) ((si)->call_count++)
|
2020-01-23 02:51:16 +08:00
|
|
|
#define stat_inc_bggc_count(si) ((si)->bg_gc++)
|
2018-09-29 18:31:28 +08:00
|
|
|
#define stat_io_skip_bggc_count(sbi) ((sbi)->io_skip_bggc++)
|
|
|
|
#define stat_other_skip_bggc_count(sbi) ((sbi)->other_skip_bggc++)
|
2015-12-17 17:14:44 +08:00
|
|
|
#define stat_inc_dirty_inode(sbi, type) ((sbi)->ndirty_inode[type]++)
|
|
|
|
#define stat_dec_dirty_inode(sbi, type) ((sbi)->ndirty_inode[type]--)
|
2015-09-30 17:38:48 +08:00
|
|
|
#define stat_inc_total_hit(sbi) (atomic64_inc(&(sbi)->total_hit_ext))
|
|
|
|
#define stat_inc_rbtree_node_hit(sbi) (atomic64_inc(&(sbi)->read_hit_rbtree))
|
|
|
|
#define stat_inc_largest_node_hit(sbi) (atomic64_inc(&(sbi)->read_hit_largest))
|
|
|
|
#define stat_inc_cached_node_hit(sbi) (atomic64_inc(&(sbi)->read_hit_cached))
|
2015-07-15 17:28:53 +08:00
|
|
|
#define stat_inc_inline_xattr(inode) \
|
|
|
|
do { \
|
|
|
|
if (f2fs_has_inline_xattr(inode)) \
|
|
|
|
(atomic_inc(&F2FS_I_SB(inode)->inline_xattr)); \
|
|
|
|
} while (0)
|
|
|
|
#define stat_dec_inline_xattr(inode) \
|
|
|
|
do { \
|
|
|
|
if (f2fs_has_inline_xattr(inode)) \
|
|
|
|
(atomic_dec(&F2FS_I_SB(inode)->inline_xattr)); \
|
|
|
|
} while (0)
|
2013-11-26 10:08:57 +08:00
|
|
|
#define stat_inc_inline_inode(inode) \
|
|
|
|
do { \
|
|
|
|
if (f2fs_has_inline_data(inode)) \
|
2014-12-08 19:08:20 +08:00
|
|
|
(atomic_inc(&F2FS_I_SB(inode)->inline_inode)); \
|
2013-11-26 10:08:57 +08:00
|
|
|
} while (0)
|
|
|
|
#define stat_dec_inline_inode(inode) \
|
|
|
|
do { \
|
|
|
|
if (f2fs_has_inline_data(inode)) \
|
2014-12-08 19:08:20 +08:00
|
|
|
(atomic_dec(&F2FS_I_SB(inode)->inline_inode)); \
|
2013-11-26 10:08:57 +08:00
|
|
|
} while (0)
|
2014-10-14 11:00:16 +08:00
|
|
|
#define stat_inc_inline_dir(inode) \
|
|
|
|
do { \
|
|
|
|
if (f2fs_has_inline_dentry(inode)) \
|
2014-12-08 19:08:20 +08:00
|
|
|
(atomic_inc(&F2FS_I_SB(inode)->inline_dir)); \
|
2014-10-14 11:00:16 +08:00
|
|
|
} while (0)
|
|
|
|
#define stat_dec_inline_dir(inode) \
|
|
|
|
do { \
|
|
|
|
if (f2fs_has_inline_dentry(inode)) \
|
2014-12-08 19:08:20 +08:00
|
|
|
(atomic_dec(&F2FS_I_SB(inode)->inline_dir)); \
|
2014-10-14 11:00:16 +08:00
|
|
|
} while (0)
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
#define stat_inc_compr_inode(inode) \
|
|
|
|
do { \
|
|
|
|
if (f2fs_compressed_file(inode)) \
|
|
|
|
(atomic_inc(&F2FS_I_SB(inode)->compr_inode)); \
|
|
|
|
} while (0)
|
|
|
|
#define stat_dec_compr_inode(inode) \
|
|
|
|
do { \
|
|
|
|
if (f2fs_compressed_file(inode)) \
|
|
|
|
(atomic_dec(&F2FS_I_SB(inode)->compr_inode)); \
|
|
|
|
} while (0)
|
|
|
|
#define stat_add_compr_blocks(inode, blocks) \
|
2020-08-31 10:09:49 +08:00
|
|
|
(atomic64_add(blocks, &F2FS_I_SB(inode)->compr_blocks))
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
#define stat_sub_compr_blocks(inode, blocks) \
|
2020-08-31 10:09:49 +08:00
|
|
|
(atomic64_sub(blocks, &F2FS_I_SB(inode)->compr_blocks))
|
2018-09-29 18:31:27 +08:00
|
|
|
#define stat_inc_meta_count(sbi, blkaddr) \
|
|
|
|
do { \
|
|
|
|
if (blkaddr < SIT_I(sbi)->sit_base_addr) \
|
|
|
|
atomic_inc(&(sbi)->meta_count[META_CP]); \
|
|
|
|
else if (blkaddr < NM_I(sbi)->nat_blkaddr) \
|
|
|
|
atomic_inc(&(sbi)->meta_count[META_SIT]); \
|
|
|
|
else if (blkaddr < SM_I(sbi)->ssa_blkaddr) \
|
|
|
|
atomic_inc(&(sbi)->meta_count[META_NAT]); \
|
|
|
|
else if (blkaddr < SM_I(sbi)->main_blkaddr) \
|
|
|
|
atomic_inc(&(sbi)->meta_count[META_SSA]); \
|
|
|
|
} while (0)
|
2013-10-22 19:56:10 +08:00
|
|
|
#define stat_inc_seg_type(sbi, curseg) \
|
|
|
|
((sbi)->segment_count[(curseg)->alloc_type]++)
|
|
|
|
#define stat_inc_block_count(sbi, curseg) \
|
|
|
|
((sbi)->block_count[(curseg)->alloc_type]++)
|
2014-12-24 01:16:54 +08:00
|
|
|
#define stat_inc_inplace_blocks(sbi) \
|
|
|
|
(atomic_inc(&(sbi)->inplace_count))
|
2016-12-29 05:55:09 +08:00
|
|
|
#define stat_update_max_atomic_write(inode) \
|
|
|
|
do { \
|
2019-12-05 11:22:39 +08:00
|
|
|
int cur = F2FS_I_SB(inode)->atomic_files; \
|
2016-12-29 05:55:09 +08:00
|
|
|
int max = atomic_read(&F2FS_I_SB(inode)->max_aw_cnt); \
|
|
|
|
if (cur > max) \
|
|
|
|
atomic_set(&F2FS_I_SB(inode)->max_aw_cnt, cur); \
|
|
|
|
} while (0)
|
2017-03-22 17:23:45 +08:00
|
|
|
#define stat_inc_volatile_write(inode) \
|
|
|
|
(atomic_inc(&F2FS_I_SB(inode)->vw_cnt))
|
|
|
|
#define stat_dec_volatile_write(inode) \
|
|
|
|
(atomic_dec(&F2FS_I_SB(inode)->vw_cnt))
|
|
|
|
#define stat_update_max_volatile_write(inode) \
|
|
|
|
do { \
|
|
|
|
int cur = atomic_read(&F2FS_I_SB(inode)->vw_cnt); \
|
|
|
|
int max = atomic_read(&F2FS_I_SB(inode)->max_vw_cnt); \
|
|
|
|
if (cur > max) \
|
|
|
|
atomic_set(&F2FS_I_SB(inode)->max_vw_cnt, cur); \
|
|
|
|
} while (0)
|
2014-12-23 07:37:39 +08:00
|
|
|
#define stat_inc_seg_count(sbi, type, gc_type) \
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
do { \
|
2013-07-12 14:47:11 +08:00
|
|
|
struct f2fs_stat_info *si = F2FS_STAT(sbi); \
|
2017-04-09 07:11:36 +08:00
|
|
|
si->tot_segs++; \
|
|
|
|
if ((type) == SUM_TYPE_DATA) { \
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
si->data_segs++; \
|
2014-12-23 07:37:39 +08:00
|
|
|
si->bg_data_segs += (gc_type == BG_GC) ? 1 : 0; \
|
|
|
|
} else { \
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
si->node_segs++; \
|
2014-12-23 07:37:39 +08:00
|
|
|
si->bg_node_segs += (gc_type == BG_GC) ? 1 : 0; \
|
|
|
|
} \
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
} while (0)
|
|
|
|
|
|
|
|
#define stat_inc_tot_blk_count(si, blks) \
|
2017-04-09 07:11:36 +08:00
|
|
|
((si)->tot_blks += (blks))
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
2014-12-23 07:37:39 +08:00
|
|
|
#define stat_inc_data_blk_count(sbi, blks, gc_type) \
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
do { \
|
2013-07-12 14:47:11 +08:00
|
|
|
struct f2fs_stat_info *si = F2FS_STAT(sbi); \
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
stat_inc_tot_blk_count(si, blks); \
|
|
|
|
si->data_blks += (blks); \
|
2017-04-09 07:11:36 +08:00
|
|
|
si->bg_data_blks += ((gc_type) == BG_GC) ? (blks) : 0; \
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
} while (0)
|
|
|
|
|
2014-12-23 07:37:39 +08:00
|
|
|
#define stat_inc_node_blk_count(sbi, blks, gc_type) \
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
do { \
|
2013-07-12 14:47:11 +08:00
|
|
|
struct f2fs_stat_info *si = F2FS_STAT(sbi); \
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
stat_inc_tot_blk_count(si, blks); \
|
|
|
|
si->node_blks += (blks); \
|
2017-04-09 07:11:36 +08:00
|
|
|
si->bg_node_blks += ((gc_type) == BG_GC) ? (blks) : 0; \
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
} while (0)
|
|
|
|
|
2017-01-31 02:55:18 +08:00
|
|
|
int f2fs_build_stats(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_destroy_stats(struct f2fs_sb_info *sbi);
|
2019-01-04 21:26:18 +08:00
|
|
|
void __init f2fs_create_root_stats(void);
|
2013-01-15 18:58:47 +08:00
|
|
|
void f2fs_destroy_root_stats(void);
|
2020-01-23 02:51:16 +08:00
|
|
|
void f2fs_update_sit_info(struct f2fs_sb_info *sbi);
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
#else
|
2017-04-20 01:38:33 +08:00
|
|
|
#define stat_inc_cp_count(si) do { } while (0)
|
|
|
|
#define stat_inc_bg_cp_count(si) do { } while (0)
|
|
|
|
#define stat_inc_call_count(si) do { } while (0)
|
|
|
|
#define stat_inc_bggc_count(si) do { } while (0)
|
2018-09-29 18:31:28 +08:00
|
|
|
#define stat_io_skip_bggc_count(sbi) do { } while (0)
|
|
|
|
#define stat_other_skip_bggc_count(sbi) do { } while (0)
|
2017-04-20 01:38:33 +08:00
|
|
|
#define stat_inc_dirty_inode(sbi, type) do { } while (0)
|
|
|
|
#define stat_dec_dirty_inode(sbi, type) do { } while (0)
|
2020-01-23 02:51:16 +08:00
|
|
|
#define stat_inc_total_hit(sbi) do { } while (0)
|
|
|
|
#define stat_inc_rbtree_node_hit(sbi) do { } while (0)
|
2017-04-20 01:38:33 +08:00
|
|
|
#define stat_inc_largest_node_hit(sbi) do { } while (0)
|
|
|
|
#define stat_inc_cached_node_hit(sbi) do { } while (0)
|
|
|
|
#define stat_inc_inline_xattr(inode) do { } while (0)
|
|
|
|
#define stat_dec_inline_xattr(inode) do { } while (0)
|
|
|
|
#define stat_inc_inline_inode(inode) do { } while (0)
|
|
|
|
#define stat_dec_inline_inode(inode) do { } while (0)
|
|
|
|
#define stat_inc_inline_dir(inode) do { } while (0)
|
|
|
|
#define stat_dec_inline_dir(inode) do { } while (0)
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
#define stat_inc_compr_inode(inode) do { } while (0)
|
|
|
|
#define stat_dec_compr_inode(inode) do { } while (0)
|
|
|
|
#define stat_add_compr_blocks(inode, blocks) do { } while (0)
|
|
|
|
#define stat_sub_compr_blocks(inode, blocks) do { } while (0)
|
2017-04-20 01:38:33 +08:00
|
|
|
#define stat_update_max_atomic_write(inode) do { } while (0)
|
|
|
|
#define stat_inc_volatile_write(inode) do { } while (0)
|
|
|
|
#define stat_dec_volatile_write(inode) do { } while (0)
|
|
|
|
#define stat_update_max_volatile_write(inode) do { } while (0)
|
2018-09-29 18:31:27 +08:00
|
|
|
#define stat_inc_meta_count(sbi, blkaddr) do { } while (0)
|
2017-04-20 01:38:33 +08:00
|
|
|
#define stat_inc_seg_type(sbi, curseg) do { } while (0)
|
|
|
|
#define stat_inc_block_count(sbi, curseg) do { } while (0)
|
|
|
|
#define stat_inc_inplace_blocks(sbi) do { } while (0)
|
|
|
|
#define stat_inc_seg_count(sbi, type, gc_type) do { } while (0)
|
|
|
|
#define stat_inc_tot_blk_count(si, blks) do { } while (0)
|
|
|
|
#define stat_inc_data_blk_count(sbi, blks, gc_type) do { } while (0)
|
|
|
|
#define stat_inc_node_blk_count(sbi, blks, gc_type) do { } while (0)
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
|
|
|
|
static inline int f2fs_build_stats(struct f2fs_sb_info *sbi) { return 0; }
|
|
|
|
static inline void f2fs_destroy_stats(struct f2fs_sb_info *sbi) { }
|
2019-01-04 21:26:18 +08:00
|
|
|
static inline void __init f2fs_create_root_stats(void) { }
|
2013-01-15 18:58:47 +08:00
|
|
|
static inline void f2fs_destroy_root_stats(void) { }
|
2020-05-09 19:21:35 +08:00
|
|
|
static inline void f2fs_update_sit_info(struct f2fs_sb_info *sbi) {}
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
#endif
|
|
|
|
|
|
|
|
extern const struct file_operations f2fs_dir_operations;
|
|
|
|
extern const struct file_operations f2fs_file_operations;
|
|
|
|
extern const struct inode_operations f2fs_file_inode_operations;
|
|
|
|
extern const struct address_space_operations f2fs_dblock_aops;
|
|
|
|
extern const struct address_space_operations f2fs_node_aops;
|
|
|
|
extern const struct address_space_operations f2fs_meta_aops;
|
|
|
|
extern const struct inode_operations f2fs_dir_inode_operations;
|
|
|
|
extern const struct inode_operations f2fs_symlink_inode_operations;
|
2015-04-30 06:10:53 +08:00
|
|
|
extern const struct inode_operations f2fs_encrypted_symlink_inode_operations;
|
f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.
- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.
- f2fs_inode_info:
contains vfs_inode and other fs-specific information.
- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.
- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.
- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.
In addition, add F2FS_SUPER_MAGIC in magic.h.
Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-28 12:37:31 +08:00
|
|
|
extern const struct inode_operations f2fs_special_inode_operations;
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
extern struct kmem_cache *f2fs_inode_entry_slab;
|
2013-11-10 23:13:16 +08:00
|
|
|
|
2013-11-10 23:13:19 +08:00
|
|
|
/*
|
|
|
|
* inline.c
|
|
|
|
*/
|
2017-01-31 02:55:18 +08:00
|
|
|
bool f2fs_may_inline_data(struct inode *inode);
|
|
|
|
bool f2fs_may_inline_dentry(struct inode *inode);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_do_read_inline_data(struct page *page, struct page *ipage);
|
|
|
|
void f2fs_truncate_inline_inode(struct inode *inode,
|
|
|
|
struct page *ipage, u64 from);
|
2017-01-31 02:55:18 +08:00
|
|
|
int f2fs_read_inline_data(struct inode *inode, struct page *page);
|
|
|
|
int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page);
|
|
|
|
int f2fs_convert_inline_inode(struct inode *inode);
|
2019-12-10 11:03:05 +08:00
|
|
|
int f2fs_try_convert_inline_dir(struct inode *dir, struct dentry *dentry);
|
2017-01-31 02:55:18 +08:00
|
|
|
int f2fs_write_inline_data(struct inode *inode, struct page *page);
|
2020-07-06 18:23:36 +08:00
|
|
|
int f2fs_recover_inline_data(struct inode *inode, struct page *npage);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
struct f2fs_dir_entry *f2fs_find_in_inline_dir(struct inode *dir,
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 15:59:04 +08:00
|
|
|
const struct f2fs_filename *fname,
|
|
|
|
struct page **res_page);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_make_empty_inline_dir(struct inode *inode, struct inode *parent,
|
2017-01-31 02:55:18 +08:00
|
|
|
struct page *ipage);
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 15:59:04 +08:00
|
|
|
int f2fs_add_inline_entry(struct inode *dir, const struct f2fs_filename *fname,
|
2017-01-31 02:55:18 +08:00
|
|
|
struct inode *inode, nid_t ino, umode_t mode);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_delete_inline_entry(struct f2fs_dir_entry *dentry,
|
|
|
|
struct page *page, struct inode *dir,
|
|
|
|
struct inode *inode);
|
2017-01-31 02:55:18 +08:00
|
|
|
bool f2fs_empty_inline_dir(struct inode *dir);
|
|
|
|
int f2fs_read_inline_dir(struct file *file, struct dir_context *ctx,
|
|
|
|
struct fscrypt_str *fstr);
|
|
|
|
int f2fs_inline_data_fiemap(struct inode *inode,
|
|
|
|
struct fiemap_extent_info *fieinfo,
|
|
|
|
__u64 start, __u64 len);
|
2015-04-21 04:57:51 +08:00
|
|
|
|
2015-06-20 03:01:21 +08:00
|
|
|
/*
|
|
|
|
* shrinker.c
|
|
|
|
*/
|
2017-01-31 02:55:18 +08:00
|
|
|
unsigned long f2fs_shrink_count(struct shrinker *shrink,
|
|
|
|
struct shrink_control *sc);
|
|
|
|
unsigned long f2fs_shrink_scan(struct shrinker *shrink,
|
|
|
|
struct shrink_control *sc);
|
|
|
|
void f2fs_join_shrinker(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_leave_shrinker(struct f2fs_sb_info *sbi);
|
2015-06-20 03:01:21 +08:00
|
|
|
|
2015-07-08 17:59:36 +08:00
|
|
|
/*
|
|
|
|
* extent_cache.c
|
|
|
|
*/
|
2018-10-04 11:18:30 +08:00
|
|
|
struct rb_entry *f2fs_lookup_rb_tree(struct rb_root_cached *root,
|
2017-04-14 23:24:55 +08:00
|
|
|
struct rb_entry *cached_re, unsigned int ofs);
|
2020-08-04 21:14:48 +08:00
|
|
|
struct rb_node **f2fs_lookup_rb_tree_ext(struct f2fs_sb_info *sbi,
|
|
|
|
struct rb_root_cached *root,
|
|
|
|
struct rb_node **parent,
|
|
|
|
unsigned long long key, bool *left_most);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
struct rb_node **f2fs_lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi,
|
2018-10-04 11:18:30 +08:00
|
|
|
struct rb_root_cached *root,
|
|
|
|
struct rb_node **parent,
|
|
|
|
unsigned int ofs, bool *leftmost);
|
|
|
|
struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
|
2017-04-14 23:24:55 +08:00
|
|
|
struct rb_entry *cached_re, unsigned int ofs,
|
|
|
|
struct rb_entry **prev_entry, struct rb_entry **next_entry,
|
|
|
|
struct rb_node ***insert_p, struct rb_node **insert_parent,
|
2018-10-04 11:18:30 +08:00
|
|
|
bool force, bool *leftmost);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
|
2020-08-04 21:14:48 +08:00
|
|
|
struct rb_root_cached *root, bool check_key);
|
2017-01-31 02:55:18 +08:00
|
|
|
unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
|
2020-06-28 10:58:17 +08:00
|
|
|
void f2fs_init_extent_tree(struct inode *inode, struct page *ipage);
|
2017-01-31 02:55:18 +08:00
|
|
|
void f2fs_drop_extent_tree(struct inode *inode);
|
|
|
|
unsigned int f2fs_destroy_extent_node(struct inode *inode);
|
|
|
|
void f2fs_destroy_extent_tree(struct inode *inode);
|
|
|
|
bool f2fs_lookup_extent_cache(struct inode *inode, pgoff_t pgofs,
|
|
|
|
struct extent_info *ei);
|
|
|
|
void f2fs_update_extent_cache(struct dnode_of_data *dn);
|
f2fs: update extent tree in batches
This patch introduce a new helper f2fs_update_extent_tree_range which can
do extent mapping update at a specified range.
The main idea is:
1) punch all mapping info in extent node(s) which are at a specified range;
2) try to merge new extent mapping with adjacent node, or failing that,
insert the mapping into extent tree as a new node.
In order to see the benefit, I add a function for stating time stamping
count as below:
uint64_t rdtsc(void)
{
uint32_t lo, hi;
__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
return (uint64_t)hi << 32 | lo;
}
My test environment is: ubuntu, intel i7-3770, 16G memory, 256g micron ssd.
truncation path: update extent cache from truncate_data_blocks_range
non-truncataion path: update extent cache from other paths
total: all update paths
a) Removing 128MB file which has one extent node mapping whole range of
file:
1. dd if=/dev/zero of=/mnt/f2fs/128M bs=1M count=128
2. sync
3. rm /mnt/f2fs/128M
Before:
total count average
truncation: 7651022 32768 233.49
Patched:
total count average
truncation: 3321 33 100.64
b) fsstress:
fsstress -d /mnt/f2fs -l 5 -n 100 -p 20
Test times: 5 times.
Before:
total count average
truncation: 5812480.6 20911.6 277.95
non-truncation: 7783845.6 13440.8 579.12
total: 13596326.2 34352.4 395.79
Patched:
total count average
truncation: 1281283.0 3041.6 421.25
non-truncation: 7355844.4 13662.8 538.38
total: 8637127.4 16704.4 517.06
1) For the updates in truncation path:
- we can see updating in batches leads total tsc and update count reducing
explicitly;
- besides, for a single batched updating, punching multiple extent nodes
in a loop, result in executing more operations, so our average tsc
increase intensively.
2) For the updates in non-truncation path:
- there is a little improvement, that is because for the scenario that we
just need to update in the head or tail of extent node, new interface
optimize to update info in extent node directly, rather than removing
original extent node for updating and then inserting that updated one
into cache as new node.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-26 20:34:48 +08:00
|
|
|
void f2fs_update_extent_cache_range(struct dnode_of_data *dn,
|
2017-01-31 02:55:18 +08:00
|
|
|
pgoff_t fofs, block_t blkaddr, unsigned int len);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_init_extent_cache_info(struct f2fs_sb_info *sbi);
|
|
|
|
int __init f2fs_create_extent_cache(void);
|
|
|
|
void f2fs_destroy_extent_cache(void);
|
2015-07-08 17:59:36 +08:00
|
|
|
|
2017-06-14 17:39:47 +08:00
|
|
|
/*
|
|
|
|
* sysfs.c
|
|
|
|
*/
|
2021-08-03 12:22:45 +08:00
|
|
|
#define MIN_RA_MUL 2
|
|
|
|
#define MAX_RA_MUL 256
|
|
|
|
|
2017-07-27 02:24:13 +08:00
|
|
|
int __init f2fs_init_sysfs(void);
|
|
|
|
void f2fs_exit_sysfs(void);
|
|
|
|
int f2fs_register_sysfs(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_unregister_sysfs(struct f2fs_sb_info *sbi);
|
2017-06-14 17:39:47 +08:00
|
|
|
|
f2fs: add fs-verity support
Add fs-verity support to f2fs. fs-verity is a filesystem feature that
enables transparent integrity protection and authentication of read-only
files. It uses a dm-verity like mechanism at the file level: a Merkle
tree is used to verify any block in the file in log(filesize) time. It
is implemented mainly by helper functions in fs/verity/. See
Documentation/filesystems/fsverity.rst for the full documentation.
The f2fs support for fs-verity consists of:
- Adding a filesystem feature flag and an inode flag for fs-verity.
- Implementing the fsverity_operations to support enabling verity on an
inode and reading/writing the verity metadata.
- Updating ->readpages() to verify data as it's read from verity files
and to support reading verity metadata pages.
- Updating ->write_begin(), ->write_end(), and ->writepages() to support
writing verity metadata pages.
- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
Like ext4, f2fs stores the verity metadata (Merkle tree and
fsverity_descriptor) past the end of the file, starting at the first 64K
boundary beyond i_size. This approach works because (a) verity files
are readonly, and (b) pages fully beyond i_size aren't visible to
userspace but can be read/written internally by f2fs with only some
relatively small changes to f2fs. Extended attributes cannot be used
because (a) f2fs limits the total size of an inode's xattr entries to
4096 bytes, which wouldn't be enough for even a single Merkle tree
block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity
metadata *must* be encrypted when the file is because it contains hashes
of the plaintext data.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-07-23 00:26:24 +08:00
|
|
|
/* verity.c */
|
|
|
|
extern const struct fsverity_operations f2fs_verityops;
|
|
|
|
|
2015-04-21 04:57:51 +08:00
|
|
|
/*
|
|
|
|
* crypto support
|
|
|
|
*/
|
2017-09-06 07:54:24 +08:00
|
|
|
static inline bool f2fs_encrypted_file(struct inode *inode)
|
|
|
|
{
|
2018-12-12 17:50:11 +08:00
|
|
|
return IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode);
|
2017-09-06 07:54:24 +08:00
|
|
|
}
|
|
|
|
|
2015-04-21 04:57:51 +08:00
|
|
|
static inline void f2fs_set_encrypted_inode(struct inode *inode)
|
|
|
|
{
|
2018-12-12 17:50:12 +08:00
|
|
|
#ifdef CONFIG_FS_ENCRYPTION
|
2015-04-21 04:57:51 +08:00
|
|
|
file_set_encrypt(inode);
|
2018-10-07 19:06:15 +08:00
|
|
|
f2fs_set_inode_flags(inode);
|
2015-04-21 04:57:51 +08:00
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
f2fs: refactor read path to allow multiple postprocessing steps
Currently f2fs's ->readpage() and ->readpages() assume that either the
data undergoes no postprocessing, or decryption only. But with
fs-verity, there will be an additional authenticity verification step,
and it may be needed either by itself, or combined with decryption.
To support this, store a 'struct bio_post_read_ctx' in ->bi_private
which contains a work struct, a bitmask of postprocessing steps that are
enabled, and an indicator of the current step. The bio completion
routine, if there was no I/O error, enqueues the first postprocessing
step. When that completes, it continues to the next step. Pages that
fail any postprocessing step have PageError set. Once all steps have
completed, pages without PageError set are set Uptodate, and all pages
are unlocked.
Also replace f2fs_encrypted_file() with a new function
f2fs_post_read_required() in places like direct I/O and garbage
collection that really should be testing whether the file needs special
I/O processing, not whether it is encrypted specifically.
This may also be useful for other future f2fs features such as
compression.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-19 02:09:48 +08:00
|
|
|
/*
|
|
|
|
* Returns true if the reads of the inode's data need to undergo some
|
|
|
|
* postprocessing step, like decryption or authenticity verification.
|
|
|
|
*/
|
|
|
|
static inline bool f2fs_post_read_required(struct inode *inode)
|
2015-04-21 04:57:51 +08:00
|
|
|
{
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
return f2fs_encrypted_file(inode) || fsverity_active(inode) ||
|
|
|
|
f2fs_compressed_file(inode);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* compress.c
|
|
|
|
*/
|
|
|
|
#ifdef CONFIG_F2FS_FS_COMPRESSION
|
|
|
|
bool f2fs_is_compressed_page(struct page *page);
|
|
|
|
struct page *f2fs_compress_control_page(struct page *page);
|
|
|
|
int f2fs_prepare_compress_overwrite(struct inode *inode,
|
|
|
|
struct page **pagep, pgoff_t index, void **fsdata);
|
|
|
|
bool f2fs_compress_write_end(struct inode *inode, void *fsdata,
|
|
|
|
pgoff_t index, unsigned copied);
|
2020-03-18 16:22:59 +08:00
|
|
|
int f2fs_truncate_partial_cluster(struct inode *inode, u64 from, bool lock);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
void f2fs_compress_write_end_io(struct bio *bio, struct page *page);
|
|
|
|
bool f2fs_is_compress_backend_ready(struct inode *inode);
|
f2fs: introduce mempool for {,de}compress intermediate page allocation
If compression feature is on, in scenario of no enough free memory,
page refault ratio is higher than before, the root cause is:
- {,de}compression flow needs to allocate intermediate pages to store
compressed data in cluster, so during their allocation, vm may reclaim
mmaped pages.
- if above reclaimed pages belong to compressed cluster, during its
refault, it may cause more intermediate pages allocation, result in
reclaiming more mmaped pages.
So this patch introduces a mempool for intermediate page allocation,
in order to avoid high refault ratio, by default, number of
preallocated page in pool is 512, user can change the number by
assigning 'num_compress_pages' parameter during module initialization.
Ma Feng found warnings in the original patch and fixed like below.
Fix the following sparse warning:
fs/f2fs/compress.c:501:5: warning: symbol 'num_compress_pages' was not declared.
Should it be static?
fs/f2fs/compress.c:530:6: warning: symbol 'f2fs_compress_free_page' was not
declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Ma Feng <mafeng.ma@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-08 19:56:05 +08:00
|
|
|
int f2fs_init_compress_mempool(void);
|
|
|
|
void f2fs_destroy_compress_mempool(void);
|
2021-05-20 19:51:50 +08:00
|
|
|
void f2fs_decompress_cluster(struct decompress_io_ctx *dic);
|
|
|
|
void f2fs_end_read_compressed_page(struct page *page, bool failed,
|
|
|
|
block_t blkaddr);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
bool f2fs_cluster_is_empty(struct compress_ctx *cc);
|
|
|
|
bool f2fs_cluster_can_merge_page(struct compress_ctx *cc, pgoff_t index);
|
2021-10-23 11:08:00 +08:00
|
|
|
bool f2fs_all_cluster_page_loaded(struct compress_ctx *cc, struct pagevec *pvec,
|
|
|
|
int index, int nr_pages);
|
2021-08-06 08:02:50 +08:00
|
|
|
bool f2fs_sanity_check_cluster(struct dnode_of_data *dn);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
void f2fs_compress_ctx_add_page(struct compress_ctx *cc, struct page *page);
|
|
|
|
int f2fs_write_multi_pages(struct compress_ctx *cc,
|
|
|
|
int *submitted,
|
|
|
|
struct writeback_control *wbc,
|
|
|
|
enum iostat_type io_type);
|
|
|
|
int f2fs_is_compressed_cluster(struct inode *inode, pgoff_t index);
|
2021-08-04 10:23:48 +08:00
|
|
|
void f2fs_update_extent_tree_range_compressed(struct inode *inode,
|
|
|
|
pgoff_t fofs, block_t blkaddr, unsigned int llen,
|
|
|
|
unsigned int c_len);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
|
|
|
|
unsigned nr_pages, sector_t *last_block_in_bio,
|
2020-02-18 18:21:35 +08:00
|
|
|
bool is_readahead, bool for_write);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc);
|
2021-01-05 14:33:02 +08:00
|
|
|
void f2fs_decompress_end_io(struct decompress_io_ctx *dic, bool failed);
|
|
|
|
void f2fs_put_page_dic(struct page *page);
|
2021-08-04 10:23:48 +08:00
|
|
|
unsigned int f2fs_cluster_blocks_are_contiguous(struct dnode_of_data *dn);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
int f2fs_init_compress_ctx(struct compress_ctx *cc);
|
2021-05-10 17:30:32 +08:00
|
|
|
void f2fs_destroy_compress_ctx(struct compress_ctx *cc, bool reuse);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
void f2fs_init_compress_info(struct f2fs_sb_info *sbi);
|
2021-05-20 19:51:50 +08:00
|
|
|
int f2fs_init_compress_inode(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_destroy_compress_inode(struct f2fs_sb_info *sbi);
|
2020-09-14 17:05:13 +08:00
|
|
|
int f2fs_init_page_array_cache(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_destroy_page_array_cache(struct f2fs_sb_info *sbi);
|
2020-09-14 17:05:14 +08:00
|
|
|
int __init f2fs_init_compress_cache(void);
|
|
|
|
void f2fs_destroy_compress_cache(void);
|
2021-05-20 19:51:50 +08:00
|
|
|
struct address_space *COMPRESS_MAPPING(struct f2fs_sb_info *sbi);
|
|
|
|
void f2fs_invalidate_compress_page(struct f2fs_sb_info *sbi, block_t blkaddr);
|
|
|
|
void f2fs_cache_compressed_page(struct f2fs_sb_info *sbi, struct page *page,
|
|
|
|
nid_t ino, block_t blkaddr);
|
|
|
|
bool f2fs_load_compressed_page(struct f2fs_sb_info *sbi, struct page *page,
|
|
|
|
block_t blkaddr);
|
|
|
|
void f2fs_invalidate_compress_pages(struct f2fs_sb_info *sbi, nid_t ino);
|
2021-03-15 16:12:33 +08:00
|
|
|
#define inc_compr_inode_stat(inode) \
|
|
|
|
do { \
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode); \
|
|
|
|
sbi->compr_new_inode++; \
|
|
|
|
} while (0)
|
|
|
|
#define add_compr_block_stat(inode, blocks) \
|
|
|
|
do { \
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode); \
|
|
|
|
int diff = F2FS_I(inode)->i_cluster_size - blocks; \
|
|
|
|
sbi->compr_written_block += blocks; \
|
|
|
|
sbi->compr_saved_block += diff; \
|
|
|
|
} while (0)
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
#else
|
|
|
|
static inline bool f2fs_is_compressed_page(struct page *page) { return false; }
|
|
|
|
static inline bool f2fs_is_compress_backend_ready(struct inode *inode)
|
|
|
|
{
|
|
|
|
if (!f2fs_compressed_file(inode))
|
|
|
|
return true;
|
|
|
|
/* not support compression */
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
static inline struct page *f2fs_compress_control_page(struct page *page)
|
|
|
|
{
|
|
|
|
WARN_ON_ONCE(1);
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
}
|
f2fs: introduce mempool for {,de}compress intermediate page allocation
If compression feature is on, in scenario of no enough free memory,
page refault ratio is higher than before, the root cause is:
- {,de}compression flow needs to allocate intermediate pages to store
compressed data in cluster, so during their allocation, vm may reclaim
mmaped pages.
- if above reclaimed pages belong to compressed cluster, during its
refault, it may cause more intermediate pages allocation, result in
reclaiming more mmaped pages.
So this patch introduces a mempool for intermediate page allocation,
in order to avoid high refault ratio, by default, number of
preallocated page in pool is 512, user can change the number by
assigning 'num_compress_pages' parameter during module initialization.
Ma Feng found warnings in the original patch and fixed like below.
Fix the following sparse warning:
fs/f2fs/compress.c:501:5: warning: symbol 'num_compress_pages' was not declared.
Should it be static?
fs/f2fs/compress.c:530:6: warning: symbol 'f2fs_compress_free_page' was not
declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Ma Feng <mafeng.ma@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-08 19:56:05 +08:00
|
|
|
static inline int f2fs_init_compress_mempool(void) { return 0; }
|
|
|
|
static inline void f2fs_destroy_compress_mempool(void) { }
|
2021-05-20 19:51:50 +08:00
|
|
|
static inline void f2fs_decompress_cluster(struct decompress_io_ctx *dic) { }
|
|
|
|
static inline void f2fs_end_read_compressed_page(struct page *page,
|
|
|
|
bool failed, block_t blkaddr)
|
2021-01-05 14:33:02 +08:00
|
|
|
{
|
|
|
|
WARN_ON_ONCE(1);
|
|
|
|
}
|
|
|
|
static inline void f2fs_put_page_dic(struct page *page)
|
|
|
|
{
|
|
|
|
WARN_ON_ONCE(1);
|
|
|
|
}
|
2021-08-04 10:23:48 +08:00
|
|
|
static inline unsigned int f2fs_cluster_blocks_are_contiguous(struct dnode_of_data *dn) { return 0; }
|
2021-08-06 08:02:50 +08:00
|
|
|
static inline bool f2fs_sanity_check_cluster(struct dnode_of_data *dn) { return false; }
|
2021-05-20 19:51:50 +08:00
|
|
|
static inline int f2fs_init_compress_inode(struct f2fs_sb_info *sbi) { return 0; }
|
|
|
|
static inline void f2fs_destroy_compress_inode(struct f2fs_sb_info *sbi) { }
|
2020-09-14 17:05:13 +08:00
|
|
|
static inline int f2fs_init_page_array_cache(struct f2fs_sb_info *sbi) { return 0; }
|
|
|
|
static inline void f2fs_destroy_page_array_cache(struct f2fs_sb_info *sbi) { }
|
2020-09-14 17:05:14 +08:00
|
|
|
static inline int __init f2fs_init_compress_cache(void) { return 0; }
|
|
|
|
static inline void f2fs_destroy_compress_cache(void) { }
|
2021-05-20 19:51:50 +08:00
|
|
|
static inline void f2fs_invalidate_compress_page(struct f2fs_sb_info *sbi,
|
|
|
|
block_t blkaddr) { }
|
|
|
|
static inline void f2fs_cache_compressed_page(struct f2fs_sb_info *sbi,
|
|
|
|
struct page *page, nid_t ino, block_t blkaddr) { }
|
|
|
|
static inline bool f2fs_load_compressed_page(struct f2fs_sb_info *sbi,
|
|
|
|
struct page *page, block_t blkaddr) { return false; }
|
|
|
|
static inline void f2fs_invalidate_compress_pages(struct f2fs_sb_info *sbi,
|
|
|
|
nid_t ino) { }
|
2021-03-15 16:12:33 +08:00
|
|
|
#define inc_compr_inode_stat(inode) do { } while (0)
|
2021-08-04 10:23:48 +08:00
|
|
|
static inline void f2fs_update_extent_tree_range_compressed(struct inode *inode,
|
|
|
|
pgoff_t fofs, block_t blkaddr, unsigned int llen,
|
|
|
|
unsigned int c_len) { }
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
#endif
|
|
|
|
|
|
|
|
static inline void set_compress_context(struct inode *inode)
|
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
|
|
|
|
|
|
|
F2FS_I(inode)->i_compress_algorithm =
|
|
|
|
F2FS_OPTION(sbi).compress_algorithm;
|
|
|
|
F2FS_I(inode)->i_log_cluster_size =
|
|
|
|
F2FS_OPTION(sbi).compress_log_size;
|
2020-11-26 18:32:09 +08:00
|
|
|
F2FS_I(inode)->i_compress_flag =
|
|
|
|
F2FS_OPTION(sbi).compress_chksum ?
|
|
|
|
1 << COMPRESS_CHKSUM : 0;
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
F2FS_I(inode)->i_cluster_size =
|
|
|
|
1 << F2FS_I(inode)->i_log_cluster_size;
|
2021-07-10 08:21:41 +08:00
|
|
|
if ((F2FS_I(inode)->i_compress_algorithm == COMPRESS_LZ4 ||
|
|
|
|
F2FS_I(inode)->i_compress_algorithm == COMPRESS_ZSTD) &&
|
2021-01-22 17:46:43 +08:00
|
|
|
F2FS_OPTION(sbi).compress_level)
|
|
|
|
F2FS_I(inode)->i_compress_flag |=
|
|
|
|
F2FS_OPTION(sbi).compress_level <<
|
|
|
|
COMPRESS_LEVEL_OFFSET;
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
F2FS_I(inode)->i_flags |= F2FS_COMPR_FL;
|
|
|
|
set_inode_flag(inode, FI_COMPRESSED_FILE);
|
|
|
|
stat_inc_compr_inode(inode);
|
2021-03-15 16:12:33 +08:00
|
|
|
inc_compr_inode_stat(inode);
|
2020-03-18 19:40:45 +08:00
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
}
|
|
|
|
|
2020-09-08 10:44:11 +08:00
|
|
|
static inline bool f2fs_disable_compressed_file(struct inode *inode)
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
{
|
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
|
|
|
|
|
|
|
if (!f2fs_compressed_file(inode))
|
2020-09-08 10:44:11 +08:00
|
|
|
return true;
|
2021-10-27 12:16:00 +08:00
|
|
|
if (S_ISREG(inode->i_mode) && F2FS_HAS_BLOCKS(inode))
|
2020-09-08 10:44:11 +08:00
|
|
|
return false;
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
|
|
|
|
fi->i_flags &= ~F2FS_COMPR_FL;
|
|
|
|
stat_dec_compr_inode(inode);
|
2020-03-10 20:50:07 +08:00
|
|
|
clear_inode_flag(inode, FI_COMPRESSED_FILE);
|
2020-03-18 19:40:45 +08:00
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
2020-09-08 10:44:11 +08:00
|
|
|
return true;
|
2015-04-21 04:57:51 +08:00
|
|
|
}
|
|
|
|
|
2018-02-06 12:31:17 +08:00
|
|
|
#define F2FS_FEATURE_FUNCS(name, flagname) \
|
2018-10-24 18:34:26 +08:00
|
|
|
static inline int f2fs_sb_has_##name(struct f2fs_sb_info *sbi) \
|
2018-02-06 12:31:17 +08:00
|
|
|
{ \
|
2018-10-24 18:34:26 +08:00
|
|
|
return F2FS_HAS_FEATURE(sbi, F2FS_FEATURE_##flagname); \
|
2016-06-14 00:47:48 +08:00
|
|
|
}
|
|
|
|
|
2018-02-06 12:31:17 +08:00
|
|
|
F2FS_FEATURE_FUNCS(encrypt, ENCRYPT);
|
|
|
|
F2FS_FEATURE_FUNCS(blkzoned, BLKZONED);
|
|
|
|
F2FS_FEATURE_FUNCS(extra_attr, EXTRA_ATTR);
|
|
|
|
F2FS_FEATURE_FUNCS(project_quota, PRJQUOTA);
|
|
|
|
F2FS_FEATURE_FUNCS(inode_chksum, INODE_CHKSUM);
|
|
|
|
F2FS_FEATURE_FUNCS(flexible_inline_xattr, FLEXIBLE_INLINE_XATTR);
|
|
|
|
F2FS_FEATURE_FUNCS(quota_ino, QUOTA_INO);
|
|
|
|
F2FS_FEATURE_FUNCS(inode_crtime, INODE_CRTIME);
|
2018-03-15 18:51:41 +08:00
|
|
|
F2FS_FEATURE_FUNCS(lost_found, LOST_FOUND);
|
f2fs: add fs-verity support
Add fs-verity support to f2fs. fs-verity is a filesystem feature that
enables transparent integrity protection and authentication of read-only
files. It uses a dm-verity like mechanism at the file level: a Merkle
tree is used to verify any block in the file in log(filesize) time. It
is implemented mainly by helper functions in fs/verity/. See
Documentation/filesystems/fsverity.rst for the full documentation.
The f2fs support for fs-verity consists of:
- Adding a filesystem feature flag and an inode flag for fs-verity.
- Implementing the fsverity_operations to support enabling verity on an
inode and reading/writing the verity metadata.
- Updating ->readpages() to verify data as it's read from verity files
and to support reading verity metadata pages.
- Updating ->write_begin(), ->write_end(), and ->writepages() to support
writing verity metadata pages.
- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
Like ext4, f2fs stores the verity metadata (Merkle tree and
fsverity_descriptor) past the end of the file, starting at the first 64K
boundary beyond i_size. This approach works because (a) verity files
are readonly, and (b) pages fully beyond i_size aren't visible to
userspace but can be read/written internally by f2fs with only some
relatively small changes to f2fs. Extended attributes cannot be used
because (a) f2fs limits the total size of an inode's xattr entries to
4096 bytes, which wouldn't be enough for even a single Merkle tree
block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity
metadata *must* be encrypted when the file is because it contains hashes
of the plaintext data.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-07-23 00:26:24 +08:00
|
|
|
F2FS_FEATURE_FUNCS(verity, VERITY);
|
2018-09-28 20:25:56 +08:00
|
|
|
F2FS_FEATURE_FUNCS(sb_chksum, SB_CHKSUM);
|
2019-07-24 07:05:28 +08:00
|
|
|
F2FS_FEATURE_FUNCS(casefold, CASEFOLD);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
F2FS_FEATURE_FUNCS(compression, COMPRESSION);
|
2021-05-21 16:32:53 +08:00
|
|
|
F2FS_FEATURE_FUNCS(readonly, RO);
|
2018-01-25 14:54:42 +08:00
|
|
|
|
2021-06-16 06:39:04 +08:00
|
|
|
static inline bool f2fs_may_extent_tree(struct inode *inode)
|
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
|
|
|
|
|
|
|
if (!test_opt(sbi, EXTENT_CACHE) ||
|
|
|
|
is_inode_flag_set(inode, FI_NO_EXTENT) ||
|
|
|
|
(is_inode_flag_set(inode, FI_COMPRESSED_FILE) &&
|
|
|
|
!f2fs_sb_has_readonly(sbi)))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* for recovered files during mount do not create extents
|
|
|
|
* if shrinker is not registered.
|
|
|
|
*/
|
|
|
|
if (list_empty(&sbi->s_list))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return S_ISREG(inode->i_mode);
|
|
|
|
}
|
|
|
|
|
2016-10-28 16:45:05 +08:00
|
|
|
#ifdef CONFIG_BLK_DEV_ZONED
|
2019-03-16 08:13:07 +08:00
|
|
|
static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
|
|
|
|
block_t blkaddr)
|
2016-10-28 16:45:05 +08:00
|
|
|
{
|
|
|
|
unsigned int zno = blkaddr >> sbi->log_blocks_per_blkz;
|
|
|
|
|
2019-03-16 08:13:07 +08:00
|
|
|
return test_bit(zno, FDEV(devi).blkz_seq);
|
2016-10-28 16:45:05 +08:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi)
|
2016-06-14 00:47:48 +08:00
|
|
|
{
|
2018-10-24 18:34:26 +08:00
|
|
|
return f2fs_sb_has_blkzoned(sbi);
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
}
|
2016-10-28 16:45:03 +08:00
|
|
|
|
2019-03-16 08:13:08 +08:00
|
|
|
static inline bool f2fs_bdev_support_discard(struct block_device *bdev)
|
|
|
|
{
|
|
|
|
return blk_queue_discard(bdev_get_queue(bdev)) ||
|
|
|
|
bdev_is_zoned(bdev);
|
|
|
|
}
|
|
|
|
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
static inline bool f2fs_hw_support_discard(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2019-03-16 08:13:08 +08:00
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!f2fs_is_multi_device(sbi))
|
|
|
|
return f2fs_bdev_support_discard(sbi->sb->s_bdev);
|
|
|
|
|
|
|
|
for (i = 0; i < sbi->s_ndevs; i++)
|
|
|
|
if (f2fs_bdev_support_discard(FDEV(i).bdev))
|
|
|
|
return true;
|
|
|
|
return false;
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool f2fs_realtime_discard_enable(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return (test_opt(sbi, DISCARD) && f2fs_hw_support_discard(sbi)) ||
|
|
|
|
f2fs_hw_should_discard(sbi);
|
2016-06-14 00:47:48 +08:00
|
|
|
}
|
|
|
|
|
2019-04-22 20:22:36 +08:00
|
|
|
static inline bool f2fs_hw_is_readonly(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!f2fs_is_multi_device(sbi))
|
|
|
|
return bdev_read_only(sbi->sb->s_bdev);
|
|
|
|
|
|
|
|
for (i = 0; i < sbi->s_ndevs; i++)
|
|
|
|
if (bdev_read_only(FDEV(i).bdev))
|
|
|
|
return true;
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2020-02-14 17:44:12 +08:00
|
|
|
static inline bool f2fs_lfs_mode(struct f2fs_sb_info *sbi)
|
2016-06-14 00:47:48 +08:00
|
|
|
{
|
2020-02-14 17:44:12 +08:00
|
|
|
return F2FS_OPTION(sbi).fs_mode == FS_MODE_LFS;
|
2016-06-14 00:47:48 +08:00
|
|
|
}
|
|
|
|
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
static inline bool f2fs_may_compress(struct inode *inode)
|
|
|
|
{
|
|
|
|
if (IS_SWAPFILE(inode) || f2fs_is_pinned_file(inode) ||
|
|
|
|
f2fs_is_atomic_file(inode) ||
|
|
|
|
f2fs_is_volatile_file(inode))
|
|
|
|
return false;
|
|
|
|
return S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void f2fs_i_compr_blocks_update(struct inode *inode,
|
|
|
|
u64 blocks, bool add)
|
|
|
|
{
|
|
|
|
int diff = F2FS_I(inode)->i_cluster_size - blocks;
|
2020-09-08 10:44:10 +08:00
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
|
2020-03-06 15:36:09 +08:00
|
|
|
/* don't update i_compr_blocks if saved blocks were released */
|
2020-09-08 10:44:10 +08:00
|
|
|
if (!add && !atomic_read(&fi->i_compr_blocks))
|
2020-03-06 15:36:09 +08:00
|
|
|
return;
|
|
|
|
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
if (add) {
|
2020-09-08 10:44:10 +08:00
|
|
|
atomic_add(diff, &fi->i_compr_blocks);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
stat_add_compr_blocks(inode, diff);
|
|
|
|
} else {
|
2020-09-08 10:44:10 +08:00
|
|
|
atomic_sub(diff, &fi->i_compr_blocks);
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
stat_sub_compr_blocks(inode, diff);
|
|
|
|
}
|
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
|
|
|
}
|
|
|
|
|
2018-09-27 18:34:52 +08:00
|
|
|
static inline int block_unaligned_IO(struct inode *inode,
|
|
|
|
struct kiocb *iocb, struct iov_iter *iter)
|
2018-03-08 18:34:38 +08:00
|
|
|
{
|
2018-09-27 18:34:52 +08:00
|
|
|
unsigned int i_blkbits = READ_ONCE(inode->i_blkbits);
|
|
|
|
unsigned int blocksize_mask = (1 << i_blkbits) - 1;
|
|
|
|
loff_t offset = iocb->ki_pos;
|
|
|
|
unsigned long align = offset | iov_iter_alignment(iter);
|
|
|
|
|
|
|
|
return align & blocksize_mask;
|
|
|
|
}
|
|
|
|
|
2021-09-01 14:39:20 +08:00
|
|
|
static inline bool f2fs_allow_multi_device_dio(struct f2fs_sb_info *sbi,
|
|
|
|
int flag)
|
|
|
|
{
|
|
|
|
if (!f2fs_is_multi_device(sbi))
|
|
|
|
return false;
|
|
|
|
if (flag != F2FS_GET_BLOCK_DIO)
|
|
|
|
return false;
|
|
|
|
return sbi->aligned_blksize;
|
|
|
|
}
|
|
|
|
|
2018-09-27 18:34:52 +08:00
|
|
|
static inline bool f2fs_force_buffered_io(struct inode *inode,
|
|
|
|
struct kiocb *iocb, struct iov_iter *iter)
|
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
|
|
|
int rw = iov_iter_rw(iter);
|
|
|
|
|
|
|
|
if (f2fs_post_read_required(inode))
|
|
|
|
return true;
|
2021-09-01 14:39:20 +08:00
|
|
|
|
|
|
|
/* disallow direct IO if any of devices has unaligned blksize */
|
|
|
|
if (f2fs_is_multi_device(sbi) && !sbi->aligned_blksize)
|
2018-09-27 18:34:52 +08:00
|
|
|
return true;
|
|
|
|
/*
|
|
|
|
* for blkzoned device, fallback direct IO to buffered IO, so
|
|
|
|
* all IOs can be serialized by log-structured write.
|
|
|
|
*/
|
2018-10-24 18:34:26 +08:00
|
|
|
if (f2fs_sb_has_blkzoned(sbi))
|
2018-09-27 18:34:52 +08:00
|
|
|
return true;
|
2020-02-14 17:44:12 +08:00
|
|
|
if (f2fs_lfs_mode(sbi) && (rw == WRITE)) {
|
2019-08-28 17:33:37 +08:00
|
|
|
if (block_unaligned_IO(inode, iocb, iter))
|
|
|
|
return true;
|
|
|
|
if (F2FS_IO_ALIGNED(sbi))
|
|
|
|
return true;
|
|
|
|
}
|
2021-02-27 20:02:29 +08:00
|
|
|
if (is_sbi_flag_set(F2FS_I_SB(inode), SBI_CP_DISABLED))
|
2018-08-21 10:21:43 +08:00
|
|
|
return true;
|
|
|
|
|
2018-09-27 18:34:52 +08:00
|
|
|
return false;
|
2018-03-08 18:34:38 +08:00
|
|
|
}
|
|
|
|
|
2021-01-05 14:33:02 +08:00
|
|
|
static inline bool f2fs_need_verity(const struct inode *inode, pgoff_t idx)
|
|
|
|
{
|
|
|
|
return fsverity_active(inode) &&
|
|
|
|
idx < DIV_ROUND_UP(inode->i_size, PAGE_SIZE);
|
|
|
|
}
|
|
|
|
|
2018-06-22 04:46:23 +08:00
|
|
|
#ifdef CONFIG_F2FS_FAULT_INJECTION
|
2018-08-08 17:36:41 +08:00
|
|
|
extern void f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned int rate,
|
|
|
|
unsigned int type);
|
2018-06-22 04:46:23 +08:00
|
|
|
#else
|
2018-08-08 17:36:41 +08:00
|
|
|
#define f2fs_build_fault_attr(sbi, rate, type) do { } while (0)
|
2018-06-22 04:46:23 +08:00
|
|
|
#endif
|
|
|
|
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 20:05:00 +08:00
|
|
|
static inline bool is_journalled_quota(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_QUOTA
|
2018-10-24 18:34:26 +08:00
|
|
|
if (f2fs_sb_has_quota_ino(sbi))
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 20:05:00 +08:00
|
|
|
return true;
|
|
|
|
if (F2FS_OPTION(sbi).s_qf_names[USRQUOTA] ||
|
|
|
|
F2FS_OPTION(sbi).s_qf_names[GRPQUOTA] ||
|
|
|
|
F2FS_OPTION(sbi).s_qf_names[PRJQUOTA])
|
|
|
|
return true;
|
|
|
|
#endif
|
|
|
|
return false;
|
|
|
|
}
|
2019-02-16 11:04:38 +08:00
|
|
|
|
f2fs: introduce discard_unit mount option
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-03 08:15:43 +08:00
|
|
|
static inline bool f2fs_block_unit_discard(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
return F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_BLOCK;
|
|
|
|
}
|
|
|
|
|
2019-06-20 11:36:14 +08:00
|
|
|
#define EFSBADCRC EBADMSG /* Bad CRC detected */
|
|
|
|
#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
|
|
|
|
|
2019-04-02 18:52:20 +08:00
|
|
|
#endif /* _LINUX_F2FS_H */
|