When getting EIO while handling orphan inodes, we can get some dirty node
pages. Then, f2fs_write_node_pages() called by iput(node_inode) will try
to flush node pages. But in this case, we should prevent to do that, since
we will try again from the start.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces spinlock to protect updating process of ckpt_flags
field in struct f2fs_checkpoint, it avoids incorrectly updating in race
condition.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: add __is_set_ckpt_flags likewise __set_ckpt_flags]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Previously, we used cp_version only to detect recoverable dnodes.
In order to avoid same garbage cp_version, we needed to truncate the next
dnode during checkpoint, resulting in additional discard or data write.
If we can distinguish this by using crc in addition to cp_version, we can
remove this overhead.
There is backward compatibility concern where it changes node_footer layout.
So, this patch introduces a new checkpoint flag, CP_CRC_RECOVERY_FLAG, to
detect new layout. New layout will be activated only when this flag is set.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds to support IO error injection for testing IO error
tolerance of f2fs.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Make inline_dentry as default mount option to improve space usage and
IO performance in scenario of numerous small directory.
It adds noinline_dentry mount option, instead.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
LKP reported -36.3% regression of fsmark.files_per_sec due to this patch.
I've confirmed that fxmark [1] has also slight regression for DWAL.
[1] https://github.com/sslab-gatech/fxmark
This reverts commit ec795418c4.
Pull vfs updates from Al Viro:
"Assorted cleanups and fixes.
Probably the most interesting part long-term is ->d_init() - that will
have a bunch of followups in (at least) ceph and lustre, but we'll
need to sort the barrier-related rules before it can get used for
really non-trivial stuff.
Another fun thing is the merge of ->d_iput() callers (dentry_iput()
and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
except the one in __d_lookup_lru())"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
fs/dcache.c: avoid soft-lockup in dput()
vfs: new d_init method
vfs: Update lookup_dcache() comment
bdev: get rid of ->bd_inodes
Remove last traces of ->sync_page
new helper: d_same_name()
dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
vfs: clean up documentation
vfs: document ->d_real()
vfs: merge .d_select_inode() into .d_real()
unify dentry_iput() and dentry_unlink_inode()
binfmt_misc: ->s_root is not going anywhere
drop redundant ->owner initializations
ufs: get rid of redundant checks
orangefs: constify inode_operations
missed comment updates from ->direct_IO() prototype change
file_inode(f)->i_mapping is f->f_mapping
trim fsnotify hooks a bit
9p: new helper - v9fs_parent_fid()
debugfs: ->d_parent is never NULL or negative
...
Datas in file can be operated by GC and DIO simultaneously, so we will
face race case as below:
For write case:
Thread A Thread B
- generic_file_direct_write
- invalidate_inode_pages2_range
- f2fs_direct_IO
- do_blockdev_direct_IO
- do_direct_IO
- get_more_blocks
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- do_write_data_page
migrate data block to new block address
- dio_bio_submit
update user data to old block address
For read case:
Thread A Thread B
- generic_file_direct_write
- invalidate_inode_pages2_range
- f2fs_direct_IO
- do_blockdev_direct_IO
- do_direct_IO
- get_more_blocks
- f2fs_balance_fs
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- do_write_data_page
migrate data block to new block address
- write_checkpoint
- do_checkpoint
- clear_prefree_segments
- f2fs_issue_discard
discard old block adress
- dio_bio_submit
update user buffer from obsolete block address
In order to fix this, for one file, we should let DIO and GC getting exclusion
against with each other.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As manual described, f_bfree indicates total free blocks in fs, in f2fs, it
includes two parts: visible free blocks and over-provision blocks. This
patch corrrects the calculation.
fsblkcnt_t f_bfree; /* free blocks in fs */
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This mount option is to enable original log-structured filesystem forcefully.
So, there should be no random writes for main area.
Especially, this supports host-managed SMR device.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There is a data race between allocate_data_block() and f2fs_sbumit_page_mbio(),
which incur unnecessary reversed bio submission.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch removes writepages lock.
We can improve multi-threading performance.
tiobench, 32 threads, 4KB write per fsync on SSD
Before: 25.88 MB/s
After: 28.03 MB/s
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
it's not needed for file_operations of inodes located on fs defined
in the hosting module and for file_operations that go into procfs.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
init_f2fs_fs does:
1) f2fs_build_trace_ios
2) init_inodecache
3) create_node_manager_caches
4) create_segment_manager_caches
5) create_checkpoint_caches
6) create_extent_cache
7) kset_create_and_add
8) kobject_init_and_add
9) register_shrinker
10) register_filesystem
11) f2fs_create_root_stats
12) proc_mkdir
exit_f2fs_fs should do cleanup in the reverse order
to make the code more clear.
Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces a new struct f2fs_fault_info and a global f2fs_fault
to save fault injection status. Fault injection entries are created in
/sys/fs/f2fs/fault_injection/ during initializing f2fs module.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch allows fscrypto to handle a second key prefix given by filesystem.
The main reason is to provide backward compatibility, since previously f2fs
used "f2fs:" as a crypto prefix instead of "fscrypt:".
Later, ext4 should also provide key_prefix() to give "ext4:".
One concern decribed by Ted would be kinda double check overhead of prefixes.
In x86, for example, validate_user_key consumes 8 ms after boot-up, which turns
out derive_key_aes() consumed most of the time to load specific crypto module.
After such the cold miss, it shows almost zero latencies, which treats as a
negligible overhead.
Note that request_key() detects wrong prefix in prior to derive_key_aes() even.
Cc: Ted Tso <tytso@mit.edu>
Cc: stable@vger.kernel.org # v4.6
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds a sbi flag, SBI_NEED_SB_WRITE, which indicates it needs to
recover superblock when (re)mounting as RW. This is set only when f2fs is
mounted as RO.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When one of superblocks is missing, f2fs recovers it with the valid one.
But, even if f2fs is mounted as RO, we'd better notify that too.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull f2fs fixes from Jaegeuk Kim.
* tag 'f2fs-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: retrieve IO write stat from the right place
f2fs crypto: fix corrupted symlink in encrypted case
f2fs: cover large section in sanity check of super