This patch tries to make below macros calculating max inline size,
inline dentry field size considerring reserving size-changeable
space:
- MAX_INLINE_DATA
- NR_INLINE_DENTRY
- INLINE_DENTRY_BITMAP_SIZE
- INLINE_RESERVED_SIZE
Then, when inline_{data,dentry} options is enabled, it allows us to
reserve inline space with different size flexibly for adding newly
introduced inode attribute.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In this round, we've added new features such as disk quota and statx, and
modified internal bio management flow to merge more IOs depending on block
types. We've also made internal threads freezeable for Android battery life.
In addition to them, there are some patches to avoid lock contention as well
as a couple of deadlock conditions.
= Enhancement
- support usrquota, grpquota, and statx
- manage DATA/NODE typed bios separately to serialize more IOs
- modify f2fs_lock_op/wio_mutex to avoid lock contention
- prevent lock contention in migratepage
= Bug fix
- miss to load written inode flag
- fix worst case victim selection in GC
- freezeable GC and discard threads for Android battery life
- sanitize f2fs metadata to deal with security hole
- clean up sysfs-related code and docs
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAllj6fMACgkQQBSofoJI
UNJ6Ng/+PqdGV/b6KroYIXI/scFx/1t87/0W+rY9tyLr1jX7nIHn9KLPjeDdvdlk
5vEeZ/dGfW8wSI+ESzscvKberG2QlOPwJRyTB4jWR+bLatwzg7YjEblz+RX4/wfJ
jKjnR7M//gRdhHdqA0xXrqguAjPbcEDK2RiVbhioMjWbZ/77j0IjcRokjMYdEf0m
cJc2oMXFtlo+DJ1h9/8BmwQPTI9FfVdgbkPFTTJzV0ydQnBdxcAigrzwYZhPOVv0
n2M1dKOiQewB4OADMuepZLFqJheItlgG9wlvEjGq7zTd5epHXRIqhM6h9GikQVb9
YKAkajlKfWcwEXaEcVXtsMHC9x69Yf8xxOSQ1VrhypSUNbaynC9LDsErJx6yrF3P
XC5baiqXsd/btg7tfrHJjk3gI+ck97d6TrTfUVR91X+1Tpkz7cyB226WxFKbyOG3
EYCFVMbrIN2CaHHt1xWIT2zCfX5w9ycp8kFjY6jPi0OOZrKXpFw+1AwwTu9kn4xJ
iuUc8pmc0/FyPqokmLef4Qp/RRM83+f+nzW/y//lkEf3nMn6qlHzNI1RAxXnBvGV
DMXzuJDcJcHGcSDr7mWyKkm6gYcak/E4DdQLQqJ6VCt6KCdCEXP/XDlig5ey5ODY
uGEr1QhXIpiYAON45HUi3gmytB3J3ZdzzpsG1PEco4+hjSuFhyE=
=N4GZ
-----END PGP SIGNATURE-----
Merge tag 'for-f2fs-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've added new features such as disk quota and statx,
and modified internal bio management flow to merge more IOs depending
on block types. We've also made internal threads freezeable for
Android battery life. In addition to them, there are some patches to
avoid lock contention as well as a couple of deadlock conditions.
Enhancements:
- support usrquota, grpquota, and statx
- manage DATA/NODE typed bios separately to serialize more IOs
- modify f2fs_lock_op/wio_mutex to avoid lock contention
- prevent lock contention in migratepage
Bug fixes:
- fix missing load of written inode flag
- fix worst case victim selection in GC
- freezeable GC and discard threads for Android battery life
- sanitize f2fs metadata to deal with security hole
- clean up sysfs-related code and docs"
* tag 'for-f2fs-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (59 commits)
f2fs: support plain user/group quota
f2fs: avoid deadlock caused by lock order of page and lock_op
f2fs: use spin_{,un}lock_irq{save,restore}
f2fs: relax migratepage for atomic written page
f2fs: don't count inode block in in-memory inode.i_blocks
Revert "f2fs: fix to clean previous mount option when remount_fs"
f2fs: do not set LOST_PINO for renamed dir
f2fs: do not set LOST_PINO for newly created dir
f2fs: skip ->writepages for {mete,node}_inode during recovery
f2fs: introduce __check_sit_bitmap
f2fs: stop gc/discard thread in prior during umount
f2fs: introduce reserved_blocks in sysfs
f2fs: avoid redundant f2fs_flush after remount
f2fs: report # of free inodes more precisely
f2fs: add ioctl to do gc with target block address
f2fs: don't need to check encrypted inode for partial truncation
f2fs: measure inode.i_blocks as generic filesystem
f2fs: set CP_TRIMMED_FLAG correctly
f2fs: require key for truncate(2) of encrypted file
f2fs: move sysfs code from super.c to fs/f2fs/sysfs.c
...
This patch adds to support plain user/group quota.
Change Note by Jaegeuk Kim.
- Use f2fs page cache for quota files in order to consider garbage collection.
so, quota files are not tolerable for sudden power-cuts, so user needs to do
quotacheck.
- setattr() calls dquot_transfer which will transfer inode->i_blocks.
We can't reclaim that during f2fs_evict_inode(). So, we need to count
node blocks as well in order to match i_blocks with dquot's space.
Note that, Chao wrote a patch to count inode->i_blocks without inode block.
(f2fs: don't count inode block in in-memory inode.i_blocks)
- in f2fs_remount, we need to make RW in prior to dquot_resume.
- handle fault_injection case during f2fs_quota_off_umount
- TODO: Project quota
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In order to avoid lock contention for atomic written pages, we'd better give
EBUSY in f2fs_migrate_page when mode is asynchronous. We expect it will be
released soon as transaction commits.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Skip ->writepages in prior to ->writepage for {meta,node}_inode during
recovery, hence unneeded loop in ->writepages can be avoided.
Moreover, check SBI_POR_DOING earlier while writebacking pages.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Currently in F2FS, page faults and operations that truncate the pagecahe
or data blocks, are completely unsynchronized. This can result in page
fault faulting in a page into a range that we are changing after
truncating, and thus we can end up with a page mapped to disk blocks that
will be shortly freed. Filesystem corruption will shortly follow.
This patch fixes the problem by creating new rw semaphore i_mmap_sem in
f2fs_inode_info and grab it for functions removing blocks from extent tree
and for read over page faults. The mechanism is similar to that in ext4.
Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Serialize data/node IOs by using fifo list instead of mutex lock,
it will help to enhance concurrency of f2fs, meanwhile keeping LFS
IO semantics.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Currently, if we do get_node_of_data before f2fs_lock_op, there may be dead lock
as follows, where process A would be in infinite loop, and B will NOT be awaked.
Process A(cp): Process B:
f2fs_lock_all(sbi)
get_dnode_of_data <---- lock dn.node_page
flush_nodes f2fs_lock_op
So, this patch adds f2fs_trylock_op to avoid f2fs_lock_op done by IPU.
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Split DATA/NODE type bio cache according to different temperature,
so write IOs with the same temperature can be merged in corresponding
bio cache as much as possible, otherwise, different temperature write
IOs submitting into one bio cache will always cause split of bio.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Merged IO flow doesn't need to care about read IOs.
f2fs_submit_merged_bio -> f2fs_submit_merged_write
f2fs_submit_merged_bios -> f2fs_submit_merged_writes
f2fs_submit_merged_bio_cond -> f2fs_submit_merged_write_cond
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Commit b685d3d65a "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions. generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions.
Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.
Fixes: b685d3d65a
Cc: stable@vger.kernel.org # 4.9+
CC: Jaegeuk Kim <jaegeuk@kernel.org>
CC: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We don't need to rewrite the page under cp_rwsem and dnode locks.
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If a page is cold, NOT atomit written and need_ipu now, there is
a high probability that IPU should be adapted. For IPU, we try to
check extent tree to get the block index first, instead of reading
the dnode page, where may lead to an useless dnode IO, since no need to
update the dnode index for IPU.
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces encrypt_one_page which encrypts one data page before
submit_bio, and change the use of need_inplace_update.
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch also reverts d0db7703ac ("f2fs: do SSR in higher priority").
This patch fixes out of free segments caused by many small file creation by
1) mkfs -s 1 2G
2) mount
3) untar
- preoduce 60000 small files burstly
4) sync
- flush node pages
- flush imeta
Here, when we do f2fs_balance_fs, we missed # of imeta blocks, resulting in
skipping to check has_not_enough_free_secs.
Another test is done by
1) mkfs -s 12 2G
2) mount
3) untar
- preoduce 60000 small files burstly
4) sync
- flush node pages
- flush imeta
In this case, this patch also fixes wrong block allocation under large section
size.
Reported-by: William Brana <wbrana@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces an ASYNC IPU policy.
Under senario of large # of async updating(e.g. log writing in Android),
disk would be seriously fragmented, and higher frequent gc would be triggered.
This patch uses IPU to rewrite the async update writting, since async is
NOT sensitive to io latency.
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
For IPU writes, there won't be any udpates in dnode page since we
will reuse old block address instead of allocating new one, so we
don't need to lock cp_rwsem during IPU IO submitting.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Callers are to unlock the page on failure after 86531d6b.
Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch tries to split in-place-update bios from sequential bios.
Suggested-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If two threads try to flush dirty pages in different inodes respectively,
f2fs_write_data_pages() will produce WRITE and WRITE_SYNC one at a time,
resulting in a lot of 4KB seperated IOs.
So, this patch gives higher priority to WB_SYNC_ALL IOs and gathers write
IOs with a big WRITE_SYNC'ed bio.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It would better split small and large IOs separately in order to get more
consecutive big writes.
The default threshold is set to 64KB, but configurable by sysfs/min_hot_blocks.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch allow write data to normal file when writting
new checkpoint.
We relax three limitations for write_begin path:
1. data allocation
2. node allocation
3. variables in checkpoint
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For f2fs_read_data_pages, the f2fs_mpage_readpages gets "page == NULL",
so that, the prefetchw(&page->flags) is operated on NULL.
Fixes: f1e8866016 ("f2fs: expose f2fs_mpage_readpages")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fix up affected files that include this signal functionality via sched.h.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We should only account F2FS_{CP_}WB_DATA IOs for write path, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Previously kernel message can show that in which function we do the
injection, but unfortunately, most of the caller are the same, for
tracking more information of injection path, it needs to show upper
caller's name. This patch supports that ability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Similar as f2fs_write_inode, f2fs_write_inline_data just
mark inode page dirty, so it's no need to write inline data
under read lock of cp_rwsem.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When more than one data blocks are allocated, the F2FS_MAP_UNWRITTEN/MAPPED
flags will be overlapped by F2FS_MAP_NEW at the later times.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If the cached bio has the last page's index, then we need to submit it.
Otherwise, we don't need to submit it and can wait for further IO merges.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
A dead loop can be triggered in f2fs_fiemap() using the test case
as below:
...
fd = open();
fallocate(fd, 0, 0, 4294967296);
ioctl(fd, FS_IOC_FIEMAP, fiemap_buf);
...
It's caused by an overflow in __get_data_block():
...
bh->b_size = map.m_len << inode->i_blkbits;
...
map.m_len is an unsigned int, and bh->b_size is a size_t which is 64 bits
on 64 bits archtecture, type conversion from an unsigned int to a size_t
will result in an overflow.
In the above-mentioned case, bh->b_size will be zero, and f2fs_fiemap()
will call get_data_block() at block 0 again an again.
Fix this by adding a force conversion before left shift.
Signed-off-by: Wei Fang <fangwei1@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Sheng Yong reports needless preallocation if write(small_buffer, large_size)
is called.
In that case, f2fs preallocates large_size, but vfs returns early due to
small_buffer size. Let's detect it before preallocation phase in f2fs.
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces a new flag to indicate inode status of doing atomic
write committing, so that, we can keep atomic write status for inode
during atomic committing, then we can skip GCing pages of atomic write inode,
that avoids random GCed datas being mixed with current transaction, so
isolation of transaction can be kept.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch implements IO alignment by filling dummy blocks in DATA and NODE
write bios. If we can guarantee, for example, 32KB or 64KB for such the IOs,
we can eliminate underlying dummy page problem which FTL conducts in order to
close MLC or TLC partial written pages.
Note that,
- it requires "-o mode=lfs".
- IO size should be power of 2, not exceed BIO_MAX_PAGES, 256.
- read IO is still 4KB.
- do checkpoint at fsync, if dummy NODE page was written.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If the range we write cover the whole valid data in the last page,
we do not need to read it.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
[Jaegeuk Kim: nullify the remaining area (fix: xfstests/f2fs/001)]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
needed for both ext4 and xfs dax changes to use iomap for DAX. It
also includes the fscrypt branch which is needed for ubifs encryption
work as well as ext4 encryption and fscrypt cleanups.
Lots of cleanups and bug fixes, especially making sure ext4 is robust
against maliciously corrupted file systems --- especially maliciously
corrupted xattr blocks and a maliciously corrupted superblock. Also
fix ext4 support for 64k block sizes so it works well on ppcle. Fixed
mbcache so we don't miss some common xattr blocks that can be merged.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlhQQVEACgkQ8vlZVpUN
gaN9TQgAoCD+V4kJjMCFhiV8u6QR3hqD6bOZbggo5wJf4CHglWkmrbAmc3jANOgH
CKsXDRRjxuDjPXf1ukB1i4M7ArLYjkbbzKdsu7lismoJLS+w8uwUKSNdep+LYMjD
alxUcf5DCzLlUmdOdW4yE22L+CwRfqfs8IpBvKmJb7DrAKiwJVA340ys6daBGuu1
63xYx0QIyPzq0xjqLb6TVf88HUI4NiGVXmlm2wcrnYd5966hEZd/SztOZTVCVWOf
Z0Z0fGQ1WJzmaBB9+YV3aBi+BObOx4m2PUprIa531+iEW02E+ot5Xd4vVQFoV/r4
NX3XtoBrT1XlKagy2sJLMBoCavqrKw==
=j4KP
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"This merge request includes the dax-4.0-iomap-pmd branch which is
needed for both ext4 and xfs dax changes to use iomap for DAX. It also
includes the fscrypt branch which is needed for ubifs encryption work
as well as ext4 encryption and fscrypt cleanups.
Lots of cleanups and bug fixes, especially making sure ext4 is robust
against maliciously corrupted file systems --- especially maliciously
corrupted xattr blocks and a maliciously corrupted superblock. Also
fix ext4 support for 64k block sizes so it works well on ppcle. Fixed
mbcache so we don't miss some common xattr blocks that can be merged"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits)
dax: Fix sleep in atomic contex in grab_mapping_entry()
fscrypt: Rename FS_WRITE_PATH_FL to FS_CTX_HAS_BOUNCE_BUFFER_FL
fscrypt: Delay bounce page pool allocation until needed
fscrypt: Cleanup page locking requirements for fscrypt_{decrypt,encrypt}_page()
fscrypt: Cleanup fscrypt_{decrypt,encrypt}_page()
fscrypt: Never allocate fscrypt_ctx on in-place encryption
fscrypt: Use correct index in decrypt path.
fscrypt: move the policy flags and encryption mode definitions to uapi header
fscrypt: move non-public structures and constants to fscrypt_private.h
fscrypt: unexport fscrypt_initialize()
fscrypt: rename get_crypt_info() to fscrypt_get_crypt_info()
fscrypto: move ioctl processing more fully into common code
fscrypto: remove unneeded Kconfig dependencies
MAINTAINERS: fscrypto: recommend linux-fsdevel for fscrypto patches
ext4: do not perform data journaling when data is encrypted
ext4: return -ENOMEM instead of success
ext4: reject inodes with negative size
ext4: remove another test in ext4_alloc_file_blocks()
Documentation: fix description of ext4's block_validity mount option
ext4: fix checks for data=ordered and journal_async_commit options
...
This patch series contains several performance tuning patches regarding to the
IO submission flow, in addition to supporting new features such as a ZBC-base
drive and multiple devices.
It also includes some major bug fixes such as:
- checkpoint version control
- fdatasync-related roll-forward recovery routine
- memory boundary or null-pointer access in corner cases
- missing error cases
It has various minor clean-up patches as well.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYTx44AAoJEEAUqH6CSFDSnAQP/jeYJq5Zd0bweEF5g00Ec1Qg
qNKQ57e9EHDRaDLBUmHHEaCEPRL0bw6SOUUWWqzGA07KcsIK+Yb/dGAyIcuV7WMl
PjntVbYm4yARDYBHGupdOCzFSkzr8gDalb+98jJnoGUonsftljhES9jedQ1NjAms
GFPHDNtirZM/r0bjKkYKjpqJ6FCxFxcGPfb/GtohDajIpohWfKZiemaXGTgtYR4d
iBVek16h+Hprz90ycZBY69uz0TdAwu/gb+htMVBrAdExHWvlFzgp35OIywiAB/YX
3QD/x4t2HqOBaNYiiOAY4ukVW/Yyqa/ZAzbm+m5B5CAcFYiWXMy+cMXUY9HJJ/K0
wdvi//Avtvgpp2PVZFn2pASx14vgMFylBzuNgKpP6MPdtWTEL33jT7VYs9Nuz45E
dgZ9IpiDt4DeTRuZ4mPO5iH7bVHPvAVV80bpXzirCCzDeNZ1EFFIQzXh/2UAmCxI
twPXGBIYul0aIl9JkWAyhCZSd3XDSqedpfPudknjhzM9Xb1H5X0QJco7f/UwsWXH
WxV6lHr1Q7UH96wJ7x/GAqj8ArOAASRV18+K51dqU+DWHnFPpBArJe39FVf8NGWs
Fz1ZmlWBQ0ZgzvLkGa80llhjalXIEy/JabMrpy6VrzQGxHdmW4cVxe4dJ3710WxX
VysJUcNMRKxMUTWOKsxp
=Boum
-----END PGP SIGNATURE-----
Merge tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"This patch series contains several performance tuning patches
regarding to the IO submission flow, in addition to supporting new
features such as a ZBC-base drive and multiple devices.
It also includes some major bug fixes such as:
- checkpoint version control
- fdatasync-related roll-forward recovery routine
- memory boundary or null-pointer access in corner cases
- missing error cases
It has various minor clean-up patches as well"
* tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits)
f2fs: fix a missing size change in f2fs_setattr
f2fs: fix to access nullified flush_cmd_control pointer
f2fs: free meta pages if sanity check for ckpt is failed
f2fs: detect wrong layout
f2fs: call sync_fs when f2fs is idle
Revert "f2fs: use percpu_counter for # of dirty pages in inode"
f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage
f2fs: do not activate auto_recovery for fallocated i_size
f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack
f2fs: fix 32-bit build
f2fs: set ->owner for debugfs status file's file_operations
f2fs: fix incorrect free inode count in ->statfs
f2fs: drop duplicate header timer.h
f2fs: fix wrong AUTO_RECOVER condition
f2fs: do not recover i_size if it's valid
f2fs: fix fdatasync
f2fs: fix to account total free nid correctly
f2fs: fix an infinite loop when flush nodes in cp
f2fs: don't wait writeback for datas during checkpoint
f2fs: fix wrong written_valid_blocks counting
...
Rename the FS_CFLG_INPLACE_ENCRYPTION flag to FS_CFLG_OWN_PAGES which,
when set, indicates that the fs uses pages under its own control as
opposed to writeback pages which require locking and a bounce buffer for
encryption.
Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
We should use AOP_WRITEPAGE_ACTIVATE when we bypass writing pages.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Normally, while committing checkpoint, we will wait on all pages to be
writebacked no matter the page is data or metadata, so in scenario where
there are lots of data IO being submitted with metadata, we may suffer
long latency for waiting writeback during checkpoint.
Indeed, we only care about persistence for pages with metadata, but not
pages with data, as file system consistent are only related to metadate,
so in order to avoid encountering long latency in above scenario, let's
recognize and reference metadata in submitted IOs, wait writeback only
for metadatas.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>