2018-09-12 09:16:07 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2012-11-02 16:06:26 +08:00
|
|
|
/**
|
|
|
|
* include/linux/f2fs_fs.h
|
|
|
|
*
|
|
|
|
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
|
|
|
|
* http://www.samsung.com/
|
|
|
|
*/
|
|
|
|
#ifndef _LINUX_F2FS_FS_H
|
|
|
|
#define _LINUX_F2FS_FS_H
|
|
|
|
|
|
|
|
#include <linux/pagemap.h>
|
|
|
|
#include <linux/types.h>
|
|
|
|
|
|
|
|
#define F2FS_SUPER_OFFSET 1024 /* byte-size offset */
|
2014-09-15 18:01:10 +08:00
|
|
|
#define F2FS_MIN_LOG_SECTOR_SIZE 9 /* 9 bits for 512 bytes */
|
|
|
|
#define F2FS_MAX_LOG_SECTOR_SIZE 12 /* 12 bits for 4096 bytes */
|
|
|
|
#define F2FS_LOG_SECTORS_PER_BLOCK 3 /* log number for sector/blk */
|
2012-11-02 16:06:26 +08:00
|
|
|
#define F2FS_BLKSIZE 4096 /* support only 4KB block */
|
2015-02-10 04:02:44 +08:00
|
|
|
#define F2FS_BLKSIZE_BITS 12 /* bits for F2FS_BLKSIZE */
|
2012-11-02 16:06:26 +08:00
|
|
|
#define F2FS_MAX_EXTENSION 64 /* # of extension entries */
|
2018-02-26 22:04:13 +08:00
|
|
|
#define F2FS_EXTENSION_LEN 8 /* max size of extension */
|
2016-02-04 05:49:44 +08:00
|
|
|
#define F2FS_BLK_ALIGN(x) (((x) + F2FS_BLKSIZE - 1) >> F2FS_BLKSIZE_BITS)
|
2012-11-02 16:06:26 +08:00
|
|
|
|
2013-05-24 11:41:04 +08:00
|
|
|
#define NULL_ADDR ((block_t)0) /* used as block_t addresses */
|
|
|
|
#define NEW_ADDR ((block_t)-1) /* used as block_t addresses */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
#define COMPRESS_ADDR ((block_t)-2) /* used as compressed data flag */
|
2012-11-02 16:06:26 +08:00
|
|
|
|
2015-02-10 04:02:44 +08:00
|
|
|
#define F2FS_BYTES_TO_BLK(bytes) ((bytes) >> F2FS_BLKSIZE_BITS)
|
|
|
|
#define F2FS_BLK_TO_BYTES(blk) ((blk) << F2FS_BLKSIZE_BITS)
|
|
|
|
|
2014-08-20 18:36:46 +08:00
|
|
|
/* 0, 1(node nid), 2(meta nid) are reserved node id */
|
|
|
|
#define F2FS_RESERVED_NODE_NUM 3
|
|
|
|
|
2017-04-20 08:36:38 +08:00
|
|
|
#define F2FS_ROOT_INO(sbi) ((sbi)->root_ino_num)
|
|
|
|
#define F2FS_NODE_INO(sbi) ((sbi)->node_ino_num)
|
|
|
|
#define F2FS_META_INO(sbi) ((sbi)->meta_ino_num)
|
2021-05-20 19:51:50 +08:00
|
|
|
#define F2FS_COMPRESS_INO(sbi) (NM_I(sbi)->max_nid)
|
2012-11-02 16:06:26 +08:00
|
|
|
|
2017-10-06 12:03:06 +08:00
|
|
|
#define F2FS_MAX_QUOTAS 3
|
|
|
|
|
2019-07-24 07:05:28 +08:00
|
|
|
#define F2FS_ENC_UTF8_12_1 1
|
|
|
|
|
2018-03-08 14:22:56 +08:00
|
|
|
#define F2FS_IO_SIZE(sbi) (1 << F2FS_OPTION(sbi).write_io_size_bits) /* Blocks */
|
|
|
|
#define F2FS_IO_SIZE_KB(sbi) (1 << (F2FS_OPTION(sbi).write_io_size_bits + 2)) /* KB */
|
|
|
|
#define F2FS_IO_SIZE_BYTES(sbi) (1 << (F2FS_OPTION(sbi).write_io_size_bits + 12)) /* B */
|
|
|
|
#define F2FS_IO_SIZE_BITS(sbi) (F2FS_OPTION(sbi).write_io_size_bits) /* power of 2 */
|
2016-12-15 02:12:56 +08:00
|
|
|
#define F2FS_IO_SIZE_MASK(sbi) (F2FS_IO_SIZE(sbi) - 1)
|
2019-07-12 16:55:42 +08:00
|
|
|
#define F2FS_IO_ALIGNED(sbi) (F2FS_IO_SIZE(sbi) > 1)
|
2016-12-15 02:12:56 +08:00
|
|
|
|
2012-11-02 16:06:26 +08:00
|
|
|
/* This flag is used by node and meta inodes, and by recovery */
|
2014-10-18 08:57:29 +08:00
|
|
|
#define GFP_F2FS_ZERO (GFP_NOFS | __GFP_ZERO)
|
2012-11-02 16:06:26 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* For further optimization on multi-head logs, on-disk layout supports maximum
|
|
|
|
* 16 logs by default. The number, 16, is expected to cover all the cases
|
|
|
|
* enoughly. The implementaion currently uses no more than 6 logs.
|
|
|
|
* Half the logs are used for nodes, and the other half are used for data.
|
|
|
|
*/
|
|
|
|
#define MAX_ACTIVE_LOGS 16
|
|
|
|
#define MAX_ACTIVE_NODE_LOGS 8
|
|
|
|
#define MAX_ACTIVE_DATA_LOGS 8
|
|
|
|
|
2015-04-21 02:52:23 +08:00
|
|
|
#define VERSION_LEN 256
|
2015-12-15 10:58:42 +08:00
|
|
|
#define MAX_VOLUME_NAME 512
|
2016-10-07 10:02:05 +08:00
|
|
|
#define MAX_PATH_LEN 64
|
|
|
|
#define MAX_DEVICES 8
|
2015-04-21 02:52:23 +08:00
|
|
|
|
2012-11-02 16:06:26 +08:00
|
|
|
/*
|
|
|
|
* For superblock
|
|
|
|
*/
|
2016-10-07 10:02:05 +08:00
|
|
|
struct f2fs_device {
|
|
|
|
__u8 path[MAX_PATH_LEN];
|
|
|
|
__le32 total_segments;
|
|
|
|
} __packed;
|
|
|
|
|
2022-09-28 23:38:53 +08:00
|
|
|
/* reason of stop_checkpoint */
|
|
|
|
enum stop_cp_reason {
|
|
|
|
STOP_CP_REASON_SHUTDOWN,
|
|
|
|
STOP_CP_REASON_FAULT_INJECT,
|
|
|
|
STOP_CP_REASON_META_PAGE,
|
|
|
|
STOP_CP_REASON_WRITE_FAIL,
|
|
|
|
STOP_CP_REASON_CORRUPTED_SUMMARY,
|
|
|
|
STOP_CP_REASON_UPDATE_INODE,
|
|
|
|
STOP_CP_REASON_FLUSH_FAIL,
|
|
|
|
STOP_CP_REASON_MAX,
|
|
|
|
};
|
|
|
|
|
|
|
|
#define MAX_STOP_REASON 32
|
|
|
|
|
2022-09-28 23:38:54 +08:00
|
|
|
/* detail reason for EFSCORRUPTED */
|
|
|
|
enum f2fs_error {
|
|
|
|
ERROR_CORRUPTED_CLUSTER,
|
|
|
|
ERROR_FAIL_DECOMPRESSION,
|
|
|
|
ERROR_INVALID_BLKADDR,
|
|
|
|
ERROR_CORRUPTED_DIRENT,
|
|
|
|
ERROR_CORRUPTED_INODE,
|
|
|
|
ERROR_INCONSISTENT_SUMMARY,
|
|
|
|
ERROR_INCONSISTENT_FOOTER,
|
|
|
|
ERROR_INCONSISTENT_SUM_TYPE,
|
|
|
|
ERROR_CORRUPTED_JOURNAL,
|
|
|
|
ERROR_INCONSISTENT_NODE_COUNT,
|
|
|
|
ERROR_INCONSISTENT_BLOCK_COUNT,
|
|
|
|
ERROR_INVALID_CURSEG,
|
|
|
|
ERROR_INCONSISTENT_SIT,
|
|
|
|
ERROR_CORRUPTED_VERITY_XATTR,
|
|
|
|
ERROR_CORRUPTED_XATTR,
|
|
|
|
ERROR_MAX,
|
|
|
|
};
|
|
|
|
|
|
|
|
#define MAX_F2FS_ERRORS 16
|
|
|
|
|
2012-11-02 16:06:26 +08:00
|
|
|
struct f2fs_super_block {
|
|
|
|
__le32 magic; /* Magic Number */
|
|
|
|
__le16 major_ver; /* Major Version */
|
|
|
|
__le16 minor_ver; /* Minor Version */
|
|
|
|
__le32 log_sectorsize; /* log2 sector size in bytes */
|
|
|
|
__le32 log_sectors_per_block; /* log2 # of sectors per block */
|
|
|
|
__le32 log_blocksize; /* log2 block size in bytes */
|
|
|
|
__le32 log_blocks_per_seg; /* log2 # of blocks per segment */
|
|
|
|
__le32 segs_per_sec; /* # of segments per section */
|
|
|
|
__le32 secs_per_zone; /* # of sections per zone */
|
|
|
|
__le32 checksum_offset; /* checksum offset inside super block */
|
|
|
|
__le64 block_count; /* total # of user blocks */
|
|
|
|
__le32 section_count; /* total # of sections */
|
|
|
|
__le32 segment_count; /* total # of segments */
|
|
|
|
__le32 segment_count_ckpt; /* # of segments for checkpoint */
|
|
|
|
__le32 segment_count_sit; /* # of segments for SIT */
|
|
|
|
__le32 segment_count_nat; /* # of segments for NAT */
|
|
|
|
__le32 segment_count_ssa; /* # of segments for SSA */
|
|
|
|
__le32 segment_count_main; /* # of segments for main area */
|
|
|
|
__le32 segment0_blkaddr; /* start block address of segment 0 */
|
|
|
|
__le32 cp_blkaddr; /* start block address of checkpoint */
|
|
|
|
__le32 sit_blkaddr; /* start block address of SIT */
|
|
|
|
__le32 nat_blkaddr; /* start block address of NAT */
|
|
|
|
__le32 ssa_blkaddr; /* start block address of SSA */
|
|
|
|
__le32 main_blkaddr; /* start block address of main area */
|
|
|
|
__le32 root_ino; /* root inode number */
|
|
|
|
__le32 node_ino; /* node inode number */
|
|
|
|
__le32 meta_ino; /* meta inode number */
|
|
|
|
__u8 uuid[16]; /* 128-bit uuid for volume */
|
2015-12-15 10:58:42 +08:00
|
|
|
__le16 volume_name[MAX_VOLUME_NAME]; /* volume name */
|
2012-11-02 16:06:26 +08:00
|
|
|
__le32 extension_count; /* # of extensions below */
|
2018-02-26 22:04:13 +08:00
|
|
|
__u8 extension_list[F2FS_MAX_EXTENSION][F2FS_EXTENSION_LEN];/* extension array */
|
2014-05-12 11:27:43 +08:00
|
|
|
__le32 cp_payload;
|
2015-04-21 02:52:23 +08:00
|
|
|
__u8 version[VERSION_LEN]; /* the kernel version */
|
|
|
|
__u8 init_version[VERSION_LEN]; /* the initial kernel version */
|
2015-04-14 06:10:36 +08:00
|
|
|
__le32 feature; /* defined features */
|
2015-04-21 04:57:51 +08:00
|
|
|
__u8 encryption_level; /* versioning level for encryption */
|
|
|
|
__u8 encrypt_pw_salt[16]; /* Salt used for string2key algorithm */
|
2016-10-07 10:02:05 +08:00
|
|
|
struct f2fs_device devs[MAX_DEVICES]; /* device list */
|
2017-10-06 12:03:06 +08:00
|
|
|
__le32 qf_ino[F2FS_MAX_QUOTAS]; /* quota inode numbers */
|
2018-02-28 17:07:27 +08:00
|
|
|
__u8 hot_ext_count; /* # of hot file extension */
|
2019-07-24 07:05:28 +08:00
|
|
|
__le16 s_encoding; /* Filename charset encoding */
|
|
|
|
__le16 s_encoding_flags; /* Filename charset encoding flags */
|
2022-09-28 23:38:53 +08:00
|
|
|
__u8 s_stop_reason[MAX_STOP_REASON]; /* stop checkpoint reason */
|
2022-09-28 23:38:54 +08:00
|
|
|
__u8 s_errors[MAX_F2FS_ERRORS]; /* reason of image corrupts */
|
|
|
|
__u8 reserved[258]; /* valid reserved region */
|
2018-09-28 20:25:56 +08:00
|
|
|
__le32 crc; /* checksum of superblock */
|
2012-11-02 16:06:26 +08:00
|
|
|
} __packed;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For checkpoint
|
|
|
|
*/
|
2020-03-03 22:29:26 +08:00
|
|
|
#define CP_RESIZEFS_FLAG 0x00004000
|
2019-01-25 09:48:38 +08:00
|
|
|
#define CP_DISABLED_QUICK_FLAG 0x00002000
|
2018-08-21 10:21:43 +08:00
|
|
|
#define CP_DISABLED_FLAG 0x00001000
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 20:05:00 +08:00
|
|
|
#define CP_QUOTA_NEED_FSCK_FLAG 0x00000800
|
2018-01-25 19:40:08 +08:00
|
|
|
#define CP_LARGE_NAT_BITMAP_FLAG 0x00000400
|
2018-01-20 05:42:33 +08:00
|
|
|
#define CP_NOCRC_RECOVERY_FLAG 0x00000200
|
2017-04-28 13:56:08 +08:00
|
|
|
#define CP_TRIMMED_FLAG 0x00000100
|
2017-02-10 02:38:09 +08:00
|
|
|
#define CP_NAT_BITS_FLAG 0x00000080
|
2016-09-20 08:55:10 +08:00
|
|
|
#define CP_CRC_RECOVERY_FLAG 0x00000040
|
2015-01-30 03:45:33 +08:00
|
|
|
#define CP_FASTBOOT_FLAG 0x00000020
|
2014-09-03 06:43:52 +08:00
|
|
|
#define CP_FSCK_FLAG 0x00000010
|
2012-11-02 16:06:26 +08:00
|
|
|
#define CP_ERROR_FLAG 0x00000008
|
|
|
|
#define CP_COMPACT_SUM_FLAG 0x00000004
|
|
|
|
#define CP_ORPHAN_PRESENT_FLAG 0x00000002
|
|
|
|
#define CP_UMOUNT_FLAG 0x00000001
|
|
|
|
|
2014-08-22 16:17:38 +08:00
|
|
|
#define F2FS_CP_PACKS 2 /* # of checkpoint packs */
|
|
|
|
|
2012-11-02 16:06:26 +08:00
|
|
|
struct f2fs_checkpoint {
|
|
|
|
__le64 checkpoint_ver; /* checkpoint block version number */
|
|
|
|
__le64 user_block_count; /* # of user blocks */
|
|
|
|
__le64 valid_block_count; /* # of valid blocks in main area */
|
|
|
|
__le32 rsvd_segment_count; /* # of reserved segments for gc */
|
|
|
|
__le32 overprov_segment_count; /* # of overprovision segments */
|
|
|
|
__le32 free_segment_count; /* # of free segments in main area */
|
|
|
|
|
|
|
|
/* information of current node segments */
|
|
|
|
__le32 cur_node_segno[MAX_ACTIVE_NODE_LOGS];
|
|
|
|
__le16 cur_node_blkoff[MAX_ACTIVE_NODE_LOGS];
|
|
|
|
/* information of current data segments */
|
|
|
|
__le32 cur_data_segno[MAX_ACTIVE_DATA_LOGS];
|
|
|
|
__le16 cur_data_blkoff[MAX_ACTIVE_DATA_LOGS];
|
|
|
|
__le32 ckpt_flags; /* Flags : umount and journal_present */
|
|
|
|
__le32 cp_pack_total_block_count; /* total # of one cp pack */
|
|
|
|
__le32 cp_pack_start_sum; /* start block number of data summary */
|
|
|
|
__le32 valid_node_count; /* Total number of valid nodes */
|
|
|
|
__le32 valid_inode_count; /* Total number of valid inodes */
|
|
|
|
__le32 next_free_nid; /* Next free node number */
|
|
|
|
__le32 sit_ver_bitmap_bytesize; /* Default value 64 */
|
|
|
|
__le32 nat_ver_bitmap_bytesize; /* Default value 256 */
|
|
|
|
__le32 checksum_offset; /* checksum offset inside cp block */
|
|
|
|
__le64 elapsed_time; /* mounted time */
|
|
|
|
/* allocation type of current segment */
|
|
|
|
unsigned char alloc_type[MAX_ACTIVE_LOGS];
|
|
|
|
|
|
|
|
/* SIT and NAT version bitmap */
|
2021-02-25 03:03:13 +08:00
|
|
|
unsigned char sit_nat_version_bitmap[];
|
2012-11-02 16:06:26 +08:00
|
|
|
} __packed;
|
|
|
|
|
2019-04-22 17:33:52 +08:00
|
|
|
#define CP_CHKSUM_OFFSET 4092 /* default chksum offset in checkpoint */
|
|
|
|
#define CP_MIN_CHKSUM_OFFSET \
|
|
|
|
(offsetof(struct f2fs_checkpoint, sit_nat_version_bitmap))
|
|
|
|
|
2012-11-02 16:06:26 +08:00
|
|
|
/*
|
|
|
|
* For orphan inode management
|
|
|
|
*/
|
|
|
|
#define F2FS_ORPHANS_PER_BLOCK 1020
|
|
|
|
|
2017-04-20 08:36:38 +08:00
|
|
|
#define GET_ORPHAN_BLOCKS(n) (((n) + F2FS_ORPHANS_PER_BLOCK - 1) / \
|
2014-08-22 16:17:38 +08:00
|
|
|
F2FS_ORPHANS_PER_BLOCK)
|
|
|
|
|
2012-11-02 16:06:26 +08:00
|
|
|
struct f2fs_orphan_block {
|
|
|
|
__le32 ino[F2FS_ORPHANS_PER_BLOCK]; /* inode numbers */
|
|
|
|
__le32 reserved; /* reserved */
|
|
|
|
__le16 blk_addr; /* block index in current CP */
|
|
|
|
__le16 blk_count; /* Number of orphan inode blocks in CP */
|
|
|
|
__le32 entry_count; /* Total number of orphan nodes in current CP */
|
|
|
|
__le32 check_sum; /* CRC32 for orphan inode block */
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For NODE structure
|
|
|
|
*/
|
|
|
|
struct f2fs_extent {
|
|
|
|
__le32 fofs; /* start file offset of the extent */
|
2015-02-05 17:47:25 +08:00
|
|
|
__le32 blk; /* start block address of the extent */
|
2019-01-25 15:35:01 +08:00
|
|
|
__le32 len; /* length of the extent */
|
2012-11-02 16:06:26 +08:00
|
|
|
} __packed;
|
|
|
|
|
2013-03-03 12:58:05 +08:00
|
|
|
#define F2FS_NAME_LEN 255
|
f2fs: support flexible inline xattr size
Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.
In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.
So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.
To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.
Inode disk layout:
+----------------------+
| .i_mode |
| ... |
| .i_ext |
+----------------------+
| .i_extra_isize |
| .i_inline_xattr_size |-----------+
| ... | |
+----------------------+ |
| .i_addr | |
| - block address or | |
| - inline data | |
+----------------------+<---+ v
| inline xattr | +---inline xattr range
+----------------------+<---+
| .i_nid |
+----------------------+
| node_footer |
| (nid, ino, offset) |
+----------------------+
Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.
Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-06 21:59:50 +08:00
|
|
|
/* 200 bytes for inline xattrs by default */
|
|
|
|
#define DEFAULT_INLINE_XATTR_ADDRS 50
|
2013-08-12 20:08:03 +08:00
|
|
|
#define DEF_ADDRS_PER_INODE 923 /* Address Pointers in an Inode */
|
2017-07-19 00:19:06 +08:00
|
|
|
#define CUR_ADDRS_PER_INODE(inode) (DEF_ADDRS_PER_INODE - \
|
|
|
|
get_extra_isize(inode))
|
2014-08-22 16:17:38 +08:00
|
|
|
#define DEF_NIDS_PER_INODE 5 /* Node IDs in an Inode */
|
2016-01-26 15:39:35 +08:00
|
|
|
#define ADDRS_PER_INODE(inode) addrs_per_inode(inode)
|
2019-03-25 21:08:19 +08:00
|
|
|
#define DEF_ADDRS_PER_BLOCK 1018 /* Address Pointers in a Direct Block */
|
|
|
|
#define ADDRS_PER_BLOCK(inode) addrs_per_block(inode)
|
2013-08-12 20:08:03 +08:00
|
|
|
#define NIDS_PER_BLOCK 1018 /* Node IDs in an Indirect Block */
|
|
|
|
|
2016-01-26 15:39:35 +08:00
|
|
|
#define ADDRS_PER_PAGE(page, inode) \
|
2019-03-25 21:08:19 +08:00
|
|
|
(IS_INODE(page) ? ADDRS_PER_INODE(inode) : ADDRS_PER_BLOCK(inode))
|
2014-04-26 19:59:52 +08:00
|
|
|
|
2013-08-12 20:08:03 +08:00
|
|
|
#define NODE_DIR1_BLOCK (DEF_ADDRS_PER_INODE + 1)
|
|
|
|
#define NODE_DIR2_BLOCK (DEF_ADDRS_PER_INODE + 2)
|
|
|
|
#define NODE_IND1_BLOCK (DEF_ADDRS_PER_INODE + 3)
|
|
|
|
#define NODE_IND2_BLOCK (DEF_ADDRS_PER_INODE + 4)
|
|
|
|
#define NODE_DIND_BLOCK (DEF_ADDRS_PER_INODE + 5)
|
2012-11-02 16:06:26 +08:00
|
|
|
|
2013-08-08 14:16:22 +08:00
|
|
|
#define F2FS_INLINE_XATTR 0x01 /* file inline xattr flag */
|
2013-11-10 23:13:16 +08:00
|
|
|
#define F2FS_INLINE_DATA 0x02 /* file inline data flag */
|
2014-09-24 18:15:19 +08:00
|
|
|
#define F2FS_INLINE_DENTRY 0x04 /* file inline dentry flag */
|
2014-10-24 10:48:09 +08:00
|
|
|
#define F2FS_DATA_EXIST 0x08 /* file inline data exist flag */
|
2015-03-31 06:07:16 +08:00
|
|
|
#define F2FS_INLINE_DOTS 0x10 /* file having implicit dot dentries */
|
2017-07-19 00:19:06 +08:00
|
|
|
#define F2FS_EXTRA_ATTR 0x20 /* file having extra attribute */
|
2017-12-08 08:25:39 +08:00
|
|
|
#define F2FS_PIN_FILE 0x40 /* file should not be gced */
|
2021-05-26 02:39:35 +08:00
|
|
|
#define F2FS_COMPRESS_RELEASED 0x80 /* file released compressed blocks */
|
2013-11-10 23:13:16 +08:00
|
|
|
|
2012-11-02 16:06:26 +08:00
|
|
|
struct f2fs_inode {
|
|
|
|
__le16 i_mode; /* file mode */
|
|
|
|
__u8 i_advise; /* file hints */
|
2013-08-08 14:16:22 +08:00
|
|
|
__u8 i_inline; /* file inline flags */
|
2012-11-02 16:06:26 +08:00
|
|
|
__le32 i_uid; /* user ID */
|
|
|
|
__le32 i_gid; /* group ID */
|
|
|
|
__le32 i_links; /* links count */
|
|
|
|
__le64 i_size; /* file size in bytes */
|
|
|
|
__le64 i_blocks; /* file size in blocks */
|
|
|
|
__le64 i_atime; /* access time */
|
|
|
|
__le64 i_ctime; /* change time */
|
|
|
|
__le64 i_mtime; /* modification time */
|
|
|
|
__le32 i_atime_nsec; /* access time in nano scale */
|
|
|
|
__le32 i_ctime_nsec; /* change time in nano scale */
|
|
|
|
__le32 i_mtime_nsec; /* modification time in nano scale */
|
|
|
|
__le32 i_generation; /* file version (for NFS) */
|
2017-12-08 08:25:39 +08:00
|
|
|
union {
|
|
|
|
__le32 i_current_depth; /* only for directory depth */
|
|
|
|
__le16 i_gc_failures; /*
|
|
|
|
* # of gc failures on pinned file.
|
|
|
|
* only for regular files.
|
|
|
|
*/
|
|
|
|
};
|
2012-11-02 16:06:26 +08:00
|
|
|
__le32 i_xattr_nid; /* nid to save xattr */
|
|
|
|
__le32 i_flags; /* file attributes */
|
|
|
|
__le32 i_pino; /* parent inode number */
|
|
|
|
__le32 i_namelen; /* file name length */
|
2013-03-03 12:58:05 +08:00
|
|
|
__u8 i_name[F2FS_NAME_LEN]; /* file name for SPOR */
|
2014-02-27 17:20:00 +08:00
|
|
|
__u8 i_dir_level; /* dentry_level for large dir */
|
2012-11-02 16:06:26 +08:00
|
|
|
|
|
|
|
struct f2fs_extent i_ext; /* caching a largest extent */
|
|
|
|
|
2017-07-19 00:19:06 +08:00
|
|
|
union {
|
|
|
|
struct {
|
|
|
|
__le16 i_extra_isize; /* extra inode attribute size */
|
f2fs: support flexible inline xattr size
Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.
In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.
So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.
To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.
Inode disk layout:
+----------------------+
| .i_mode |
| ... |
| .i_ext |
+----------------------+
| .i_extra_isize |
| .i_inline_xattr_size |-----------+
| ... | |
+----------------------+ |
| .i_addr | |
| - block address or | |
| - inline data | |
+----------------------+<---+ v
| inline xattr | +---inline xattr range
+----------------------+<---+
| .i_nid |
+----------------------+
| node_footer |
| (nid, ino, offset) |
+----------------------+
Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.
Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-06 21:59:50 +08:00
|
|
|
__le16 i_inline_xattr_size; /* inline xattr size, unit: 4 bytes */
|
2017-07-26 00:01:41 +08:00
|
|
|
__le32 i_projid; /* project id */
|
2017-07-31 20:19:09 +08:00
|
|
|
__le32 i_inode_checksum;/* inode meta checksum */
|
2018-01-25 14:54:42 +08:00
|
|
|
__le64 i_crtime; /* creation time */
|
|
|
|
__le32 i_crtime_nsec; /* creation time in nano scale */
|
f2fs: support data compression
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-01 18:07:14 +08:00
|
|
|
__le64 i_compr_blocks; /* # of compressed blocks */
|
|
|
|
__u8 i_compress_algorithm; /* compress algorithm */
|
|
|
|
__u8 i_log_cluster_size; /* log of cluster size */
|
2020-11-26 18:32:09 +08:00
|
|
|
__le16 i_compress_flag; /* compress flag */
|
2021-01-22 17:46:43 +08:00
|
|
|
/* 0 bit: chksum flag
|
|
|
|
* [10,15] bits: compress level
|
|
|
|
*/
|
2017-07-19 00:19:06 +08:00
|
|
|
__le32 i_extra_end[0]; /* for attribute size calculation */
|
2018-01-25 14:54:42 +08:00
|
|
|
} __packed;
|
2017-07-19 00:19:06 +08:00
|
|
|
__le32 i_addr[DEF_ADDRS_PER_INODE]; /* Pointers to data blocks */
|
|
|
|
};
|
2014-08-22 16:17:38 +08:00
|
|
|
__le32 i_nid[DEF_NIDS_PER_INODE]; /* direct(2), indirect(2),
|
2012-11-02 16:06:26 +08:00
|
|
|
double_indirect(1) node id */
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
struct direct_node {
|
2019-03-25 21:08:19 +08:00
|
|
|
__le32 addr[DEF_ADDRS_PER_BLOCK]; /* array of data block address */
|
2012-11-02 16:06:26 +08:00
|
|
|
} __packed;
|
|
|
|
|
|
|
|
struct indirect_node {
|
|
|
|
__le32 nid[NIDS_PER_BLOCK]; /* array of data block address */
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
enum {
|
|
|
|
COLD_BIT_SHIFT = 0,
|
|
|
|
FSYNC_BIT_SHIFT,
|
|
|
|
DENT_BIT_SHIFT,
|
|
|
|
OFFSET_BIT_SHIFT
|
|
|
|
};
|
|
|
|
|
2014-12-24 08:26:31 +08:00
|
|
|
#define OFFSET_BIT_MASK (0x07) /* (0x01 << OFFSET_BIT_SHIFT) - 1 */
|
|
|
|
|
2012-11-02 16:06:26 +08:00
|
|
|
struct node_footer {
|
|
|
|
__le32 nid; /* node id */
|
2019-02-18 14:26:41 +08:00
|
|
|
__le32 ino; /* inode number */
|
2012-11-02 16:06:26 +08:00
|
|
|
__le32 flag; /* include cold/fsync/dentry marks and offset */
|
|
|
|
__le64 cp_ver; /* checkpoint version */
|
|
|
|
__le32 next_blkaddr; /* next node page block address */
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
struct f2fs_node {
|
|
|
|
/* can be one of three types: inode, direct, and indirect types */
|
|
|
|
union {
|
|
|
|
struct f2fs_inode i;
|
|
|
|
struct direct_node dn;
|
|
|
|
struct indirect_node in;
|
|
|
|
};
|
|
|
|
struct node_footer footer;
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For NAT entries
|
|
|
|
*/
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
#define NAT_ENTRY_PER_BLOCK (PAGE_SIZE / sizeof(struct f2fs_nat_entry))
|
2012-11-02 16:06:26 +08:00
|
|
|
|
|
|
|
struct f2fs_nat_entry {
|
|
|
|
__u8 version; /* latest version of cached nat entry */
|
|
|
|
__le32 ino; /* inode number */
|
|
|
|
__le32 block_addr; /* block address */
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
struct f2fs_nat_block {
|
|
|
|
struct f2fs_nat_entry entries[NAT_ENTRY_PER_BLOCK];
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For SIT entries
|
|
|
|
*
|
|
|
|
* Each segment is 2MB in size by default so that a bitmap for validity of
|
|
|
|
* there-in blocks should occupy 64 bytes, 512 bits.
|
|
|
|
* Not allow to change this.
|
|
|
|
*/
|
|
|
|
#define SIT_VBLOCK_MAP_SIZE 64
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
#define SIT_ENTRY_PER_BLOCK (PAGE_SIZE / sizeof(struct f2fs_sit_entry))
|
2012-11-02 16:06:26 +08:00
|
|
|
|
2017-04-26 07:28:48 +08:00
|
|
|
/*
|
|
|
|
* F2FS uses 4 bytes to represent block address. As a result, supported size of
|
|
|
|
* disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments.
|
|
|
|
*/
|
|
|
|
#define F2FS_MAX_SEGMENT ((16 * 1024 * 1024) / 2)
|
|
|
|
|
2012-11-02 16:06:26 +08:00
|
|
|
/*
|
|
|
|
* Note that f2fs_sit_entry->vblocks has the following bit-field information.
|
|
|
|
* [15:10] : allocation type such as CURSEG_XXXX_TYPE
|
|
|
|
* [9:0] : valid block count
|
|
|
|
*/
|
|
|
|
#define SIT_VBLOCKS_SHIFT 10
|
|
|
|
#define SIT_VBLOCKS_MASK ((1 << SIT_VBLOCKS_SHIFT) - 1)
|
|
|
|
#define GET_SIT_VBLOCKS(raw_sit) \
|
|
|
|
(le16_to_cpu((raw_sit)->vblocks) & SIT_VBLOCKS_MASK)
|
|
|
|
#define GET_SIT_TYPE(raw_sit) \
|
|
|
|
((le16_to_cpu((raw_sit)->vblocks) & ~SIT_VBLOCKS_MASK) \
|
|
|
|
>> SIT_VBLOCKS_SHIFT)
|
|
|
|
|
|
|
|
struct f2fs_sit_entry {
|
|
|
|
__le16 vblocks; /* reference above */
|
|
|
|
__u8 valid_map[SIT_VBLOCK_MAP_SIZE]; /* bitmap for valid blocks */
|
|
|
|
__le64 mtime; /* segment age for cleaning */
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
struct f2fs_sit_block {
|
|
|
|
struct f2fs_sit_entry entries[SIT_ENTRY_PER_BLOCK];
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For segment summary
|
|
|
|
*
|
|
|
|
* One summary block contains exactly 512 summary entries, which represents
|
|
|
|
* exactly 2MB segment by default. Not allow to change the basic units.
|
|
|
|
*
|
|
|
|
* NOTE: For initializing fields, you must use set_summary
|
|
|
|
*
|
|
|
|
* - If data page, nid represents dnode's nid
|
|
|
|
* - If node page, nid represents the node page's nid.
|
|
|
|
*
|
|
|
|
* The ofs_in_node is used by only data page. It represents offset
|
|
|
|
* from node's page's beginning to get a data block address.
|
|
|
|
* ex) data_blkaddr = (block_t)(nodepage_start_address + ofs_in_node)
|
|
|
|
*/
|
|
|
|
#define ENTRIES_IN_SUM 512
|
2012-11-28 15:12:41 +08:00
|
|
|
#define SUMMARY_SIZE (7) /* sizeof(struct summary) */
|
|
|
|
#define SUM_FOOTER_SIZE (5) /* sizeof(struct summary_footer) */
|
2012-11-02 16:06:26 +08:00
|
|
|
#define SUM_ENTRY_SIZE (SUMMARY_SIZE * ENTRIES_IN_SUM)
|
|
|
|
|
|
|
|
/* a summary entry for a 4KB-sized block in a segment */
|
|
|
|
struct f2fs_summary {
|
|
|
|
__le32 nid; /* parent node id */
|
|
|
|
union {
|
|
|
|
__u8 reserved[3];
|
|
|
|
struct {
|
|
|
|
__u8 version; /* node version number */
|
|
|
|
__le16 ofs_in_node; /* block index in parent node */
|
|
|
|
} __packed;
|
|
|
|
};
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
/* summary block type, node or data, is stored to the summary_footer */
|
|
|
|
#define SUM_TYPE_NODE (1)
|
|
|
|
#define SUM_TYPE_DATA (0)
|
|
|
|
|
|
|
|
struct summary_footer {
|
|
|
|
unsigned char entry_type; /* SUM_TYPE_XXX */
|
2016-01-28 19:40:26 +08:00
|
|
|
__le32 check_sum; /* summary checksum */
|
2012-11-02 16:06:26 +08:00
|
|
|
} __packed;
|
|
|
|
|
2012-11-28 15:12:41 +08:00
|
|
|
#define SUM_JOURNAL_SIZE (F2FS_BLKSIZE - SUM_FOOTER_SIZE -\
|
2012-11-02 16:06:26 +08:00
|
|
|
SUM_ENTRY_SIZE)
|
|
|
|
#define NAT_JOURNAL_ENTRIES ((SUM_JOURNAL_SIZE - 2) /\
|
|
|
|
sizeof(struct nat_journal_entry))
|
|
|
|
#define NAT_JOURNAL_RESERVED ((SUM_JOURNAL_SIZE - 2) %\
|
|
|
|
sizeof(struct nat_journal_entry))
|
|
|
|
#define SIT_JOURNAL_ENTRIES ((SUM_JOURNAL_SIZE - 2) /\
|
|
|
|
sizeof(struct sit_journal_entry))
|
|
|
|
#define SIT_JOURNAL_RESERVED ((SUM_JOURNAL_SIZE - 2) %\
|
|
|
|
sizeof(struct sit_journal_entry))
|
2016-01-27 09:57:30 +08:00
|
|
|
|
|
|
|
/* Reserved area should make size of f2fs_extra_info equals to
|
|
|
|
* that of nat_journal and sit_journal.
|
|
|
|
*/
|
|
|
|
#define EXTRA_INFO_RESERVED (SUM_JOURNAL_SIZE - 2 - 8)
|
|
|
|
|
2012-11-02 16:06:26 +08:00
|
|
|
/*
|
|
|
|
* frequently updated NAT/SIT entries can be stored in the spare area in
|
|
|
|
* summary blocks
|
|
|
|
*/
|
|
|
|
enum {
|
|
|
|
NAT_JOURNAL = 0,
|
|
|
|
SIT_JOURNAL
|
|
|
|
};
|
|
|
|
|
|
|
|
struct nat_journal_entry {
|
|
|
|
__le32 nid;
|
|
|
|
struct f2fs_nat_entry ne;
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
struct nat_journal {
|
|
|
|
struct nat_journal_entry entries[NAT_JOURNAL_ENTRIES];
|
|
|
|
__u8 reserved[NAT_JOURNAL_RESERVED];
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
struct sit_journal_entry {
|
|
|
|
__le32 segno;
|
|
|
|
struct f2fs_sit_entry se;
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
struct sit_journal {
|
|
|
|
struct sit_journal_entry entries[SIT_JOURNAL_ENTRIES];
|
|
|
|
__u8 reserved[SIT_JOURNAL_RESERVED];
|
|
|
|
} __packed;
|
|
|
|
|
2016-01-27 09:57:30 +08:00
|
|
|
struct f2fs_extra_info {
|
|
|
|
__le64 kbytes_written;
|
|
|
|
__u8 reserved[EXTRA_INFO_RESERVED];
|
|
|
|
} __packed;
|
|
|
|
|
2016-02-14 18:50:40 +08:00
|
|
|
struct f2fs_journal {
|
2012-11-02 16:06:26 +08:00
|
|
|
union {
|
|
|
|
__le16 n_nats;
|
|
|
|
__le16 n_sits;
|
|
|
|
};
|
2016-01-27 09:57:30 +08:00
|
|
|
/* spare area is used by NAT or SIT journals or extra info */
|
2012-11-02 16:06:26 +08:00
|
|
|
union {
|
|
|
|
struct nat_journal nat_j;
|
|
|
|
struct sit_journal sit_j;
|
2016-01-27 09:57:30 +08:00
|
|
|
struct f2fs_extra_info info;
|
2012-11-02 16:06:26 +08:00
|
|
|
};
|
2016-02-14 18:50:40 +08:00
|
|
|
} __packed;
|
|
|
|
|
|
|
|
/* 4KB-sized summary block structure */
|
|
|
|
struct f2fs_summary_block {
|
|
|
|
struct f2fs_summary entries[ENTRIES_IN_SUM];
|
|
|
|
struct f2fs_journal journal;
|
2012-11-02 16:06:26 +08:00
|
|
|
struct summary_footer footer;
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For directory operations
|
|
|
|
*/
|
|
|
|
#define F2FS_DOT_HASH 0
|
|
|
|
#define F2FS_DDOT_HASH F2FS_DOT_HASH
|
|
|
|
#define F2FS_MAX_HASH (~((0x3ULL) << 62))
|
|
|
|
#define F2FS_HASH_COL_BIT ((0x1ULL) << 63)
|
|
|
|
|
|
|
|
typedef __le32 f2fs_hash_t;
|
|
|
|
|
|
|
|
/* One directory entry slot covers 8bytes-long file name */
|
2013-03-03 12:58:05 +08:00
|
|
|
#define F2FS_SLOT_LEN 8
|
|
|
|
#define F2FS_SLOT_LEN_BITS 3
|
2012-12-08 13:54:50 +08:00
|
|
|
|
2017-04-20 08:36:38 +08:00
|
|
|
#define GET_DENTRY_SLOTS(x) (((x) + F2FS_SLOT_LEN - 1) >> F2FS_SLOT_LEN_BITS)
|
2012-11-02 16:06:26 +08:00
|
|
|
|
|
|
|
/* MAX level for dir lookup */
|
|
|
|
#define MAX_DIR_HASH_DEPTH 63
|
|
|
|
|
2014-05-28 08:56:09 +08:00
|
|
|
/* MAX buckets in one level of dir */
|
|
|
|
#define MAX_DIR_BUCKETS (1 << ((MAX_DIR_HASH_DEPTH / 2) - 1))
|
|
|
|
|
2015-08-19 19:02:02 +08:00
|
|
|
/*
|
2017-07-19 00:19:05 +08:00
|
|
|
* space utilization of regular dentry and inline dentry (w/o extra reservation)
|
2019-02-15 00:08:25 +08:00
|
|
|
* regular dentry inline dentry (def) inline dentry (min)
|
|
|
|
* bitmap 1 * 27 = 27 1 * 23 = 23 1 * 1 = 1
|
|
|
|
* reserved 1 * 3 = 3 1 * 7 = 7 1 * 1 = 1
|
|
|
|
* dentry 11 * 214 = 2354 11 * 182 = 2002 11 * 2 = 22
|
|
|
|
* filename 8 * 214 = 1712 8 * 182 = 1456 8 * 2 = 16
|
|
|
|
* total 4096 3488 40
|
2015-08-19 19:02:02 +08:00
|
|
|
*
|
|
|
|
* Note: there are more reserved space in inline dentry than in regular
|
|
|
|
* dentry, when converting inline dentry we should handle this carefully.
|
|
|
|
*/
|
|
|
|
#define NR_DENTRY_IN_BLOCK 214 /* the number of dentry in a block */
|
2012-11-02 16:06:26 +08:00
|
|
|
#define SIZE_OF_DIR_ENTRY 11 /* by byte */
|
|
|
|
#define SIZE_OF_DENTRY_BITMAP ((NR_DENTRY_IN_BLOCK + BITS_PER_BYTE - 1) / \
|
|
|
|
BITS_PER_BYTE)
|
|
|
|
#define SIZE_OF_RESERVED (PAGE_SIZE - ((SIZE_OF_DIR_ENTRY + \
|
2013-03-03 12:58:05 +08:00
|
|
|
F2FS_SLOT_LEN) * \
|
2012-11-02 16:06:26 +08:00
|
|
|
NR_DENTRY_IN_BLOCK + SIZE_OF_DENTRY_BITMAP))
|
2019-02-15 00:08:25 +08:00
|
|
|
#define MIN_INLINE_DENTRY_SIZE 40 /* just include '.' and '..' entries */
|
2012-11-02 16:06:26 +08:00
|
|
|
|
2013-03-03 12:58:05 +08:00
|
|
|
/* One directory entry slot representing F2FS_SLOT_LEN-sized file name */
|
2012-11-02 16:06:26 +08:00
|
|
|
struct f2fs_dir_entry {
|
|
|
|
__le32 hash_code; /* hash code of file name */
|
|
|
|
__le32 ino; /* inode number */
|
2019-01-25 15:35:01 +08:00
|
|
|
__le16 name_len; /* length of file name */
|
2012-11-02 16:06:26 +08:00
|
|
|
__u8 file_type; /* file type */
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
/* 4KB-sized directory entry block */
|
|
|
|
struct f2fs_dentry_block {
|
|
|
|
/* validity bitmap for directory entries in each block */
|
|
|
|
__u8 dentry_bitmap[SIZE_OF_DENTRY_BITMAP];
|
|
|
|
__u8 reserved[SIZE_OF_RESERVED];
|
|
|
|
struct f2fs_dir_entry dentry[NR_DENTRY_IN_BLOCK];
|
2013-03-03 12:58:05 +08:00
|
|
|
__u8 filename[NR_DENTRY_IN_BLOCK][F2FS_SLOT_LEN];
|
2012-11-02 16:06:26 +08:00
|
|
|
} __packed;
|
|
|
|
|
|
|
|
/* file types used in inode_info->flags */
|
|
|
|
enum {
|
|
|
|
F2FS_FT_UNKNOWN,
|
|
|
|
F2FS_FT_REG_FILE,
|
|
|
|
F2FS_FT_DIR,
|
|
|
|
F2FS_FT_CHRDEV,
|
|
|
|
F2FS_FT_BLKDEV,
|
|
|
|
F2FS_FT_FIFO,
|
|
|
|
F2FS_FT_SOCK,
|
|
|
|
F2FS_FT_SYMLINK,
|
|
|
|
F2FS_FT_MAX
|
|
|
|
};
|
|
|
|
|
f2fs: fix to convert inline directory correctly
With below serials, we will lose parts of dirents:
1) mount f2fs with inline_dentry option
2) echo 1 > /sys/fs/f2fs/sdX/dir_level
3) mkdir dir
4) touch 180 files named [1-180] in dir
5) touch 181 in dir
6) echo 3 > /proc/sys/vm/drop_caches
7) ll dir
ls: cannot access 2: No such file or directory
ls: cannot access 4: No such file or directory
ls: cannot access 5: No such file or directory
ls: cannot access 6: No such file or directory
ls: cannot access 8: No such file or directory
ls: cannot access 9: No such file or directory
...
total 360
drwxr-xr-x 2 root root 4096 Feb 19 15:12 ./
drwxr-xr-x 3 root root 4096 Feb 19 15:11 ../
-rw-r--r-- 1 root root 0 Feb 19 15:12 1
-rw-r--r-- 1 root root 0 Feb 19 15:12 10
-rw-r--r-- 1 root root 0 Feb 19 15:12 100
-????????? ? ? ? ? ? 101
-????????? ? ? ? ? ? 102
-????????? ? ? ? ? ? 103
...
The reason is: when doing the inline dir conversion, we didn't consider
that directory has hierarchical hash structure which can be configured
through sysfs interface 'dir_level'.
By default, dir_level of directory inode is 0, it means we have one bucket
in hash table located in first level, all dirents will be hashed in this
bucket, so it has no problem for us to do the duplication simply between
inline dentry page and converted normal dentry page.
However, if we configured dir_level with the value N (greater than 0), it
will expand the bucket number of first level hash table by 2^N - 1, it
hashs dirents into different buckets according their hash value, if we
still move all dirents to first bucket, it makes incorrent locating for
inline dirents, the result is, although we can iterate all dirents through
->readdir, we can't stat some of them in ->lookup which based on hash
table searching.
This patch fixes this issue by rehashing dirents into correct position
when converting inline directory.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 18:29:18 +08:00
|
|
|
#define S_SHIFT 12
|
|
|
|
|
2017-07-26 00:01:41 +08:00
|
|
|
#define F2FS_DEF_PROJID 0 /* default project ID */
|
|
|
|
|
2012-11-02 16:06:26 +08:00
|
|
|
#endif /* _LINUX_F2FS_FS_H */
|