linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Jaegeuk Kim	24b8491251	f2fs: preallocate blocks for buffered aio writes This patch preallocates data blocks for buffered aio writes. With this patch, we can avoid redundant locking and unlocking of node pages given consecutive aio request. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	b439b103a6	f2fs: move dio preallocation into f2fs_file_write_iter This patch moves preallocation code for direct IOs into f2fs_file_write_iter. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Yunlei He	d31c7c3f0b	f2fs: fix missing skip pages info fix missing skip pages info in f2fs_writepages trace event. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	0c3a579758	f2fs: introduce f2fs_submit_merged_bio_cond f2fs use single bio buffer per type data (META/NODE/DATA) for caching writes locating in continuous block address as many as possible, after submitting, these writes may be still cached in bio buffer, so we have to flush cached writes in bio buffer by calling f2fs_submit_merged_bio. Unfortunately, in the scenario of high concurrency, bio buffer could be flushed by someone else before we submit it as below reasons: a) there is no space in bio buffer. b) add a request of different type (SYNC, ASYNC). c) add a discontinuous block address. For this condition, f2fs_submit_merged_bio will be devastating, because it could break the following merging of writes in bio buffer, split one big bio into two smaller one. This patch introduces f2fs_submit_merged_bio_cond which can do a conditional submitting with bio buffer, before submitting it will judge whether: - page in DATA type bio buffer is matching with specified page; - page in DATA type bio buffer is belong to specified inode; - page in NODE type bio buffer is belong to specified inode; If there is no eligible page in bio buffer, we will skip submitting step, result in gaining more chance to merge consecutive block IOs in bio cache. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	da85985c61	f2fs: speed up handling holes in fiemap This patch makes f2fs_map_blocks supporting returning next potential page offset which skips hole region in indirect tree of inode, and use it to speed up fiemap in handling big hole case. Test method: xfs_io -f /mnt/f2fs/file -c "pwrite 1099511627776 4096" time xfs_io -f /mnt/f2fs/file -c "fiemap -v" Before: time xfs_io -f /mnt/f2fs/file -c "fiemap -v" /mnt/f2fs/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2147483647]: hole 2147483648 1: [2147483648..2147483655]: 81920..81927 8 0x1 real 3m3.518s user 0m0.000s sys 3m3.456s After: time xfs_io -f /mnt/f2fs/file -c "fiemap -v" /mnt/f2fs/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2147483647]: hole 2147483648 1: [2147483648..2147483655]: 81920..81927 8 0x1 real 0m0.008s user 0m0.000s sys 0m0.008s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	81ca7350ce	f2fs: remove unneeded pointer conversion There are redundant pointer conversion in following call stack: - at position a, inode was been converted to f2fs_file_info. - at position b, f2fs_file_info was been converted to inode again. - truncate_blocks(inode,..) - fi = F2FS_I(inode) ---a - ADDRS_PER_PAGE(node_page, fi) - addrs_per_inode(fi) - inode = &fi->vfs_inode ---b - f2fs_has_inline_xattr(inode) - fi = F2FS_I(inode) - is_inode_flag_set(fi,..) In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and addrs_per_inode to acept parameter with type of inode pointer. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	5b8db7fada	f2fs: simplify __allocate_data_blocks This patch uses existing function f2fs_map_block to simplify implementation of __allocate_data_blocks. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	4fe71e88bf	f2fs: simplify f2fs_map_blocks In f2fs_map_blocks, we use duplicated codes to handle first block mapping and the following blocks mapping, it's unnecessary. This patch simplifies f2fs_map_blocks to avoid using copied codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	7c506896cf	f2fs: use wq_has_sleeper for cp_wait wait_queue We need to use wq_has_sleeper including smp_mb to consider cp_wait concurrency. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	fec1d6576c	f2fs: use wait_for_stable_page to avoid contention In write_begin, if storage supports stable_page, we don't need to wait for writeback to update its contents. This patch introduces to use wait_for_stable_page instead of wait_on_page_writeback. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	e3ef18762f	f2fs: don't need to call set_page_dirty for io error If end_io gets an error, we don't need to set the page as dirty, since we already set f2fs_stop_checkpoint which will not flush any data. This will resolve the following warning. ====================================================== [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] 4.4.0+ #9 Tainted: G O ------------------------------------------------------ xfs_io/26773 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: (&(&sbi->inode_lock[i])->rlock){+.+...}, at: [<ffffffffc025483f>] update_dirty_page+0x6f/0xd0 [f2fs] and this task is already holding: (&(&q->__queue_lock)->rlock){-.-.-.}, at: [<ffffffff81396ea2>] blk_queue_bio+0x422/0x490 which would create a new lock dependency: (&(&q->__queue_lock)->rlock){-.-.-.} -> (&(&sbi->inode_lock[i])->rlock){+.+...} Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	ae96e7bdd4	f2fs: avoid needless sync_inode_page when reading inline_data In write_begin, if there is an inline_data, f2fs loads it into 0'th data page. Since it's the read path, we don't need to sync its inode page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	52f8033712	f2fs: don't need to sync node page at every time In write_end, we don't need to sync inode page at every time. Instead, we can expect f2fs_write_inode will update later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	2049d4fcb0	f2fs: avoid multiple node page writes due to inline_data The sceanrio is: 1. create fully node blocks 2. flush node blocks 3. write inline_data for all the node blocks again 4. flush node blocks redundantly So, this patch tries to flush inline_data when flushing node blocks. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	3c082b7b5b	f2fs: do f2fs_balance_fs when block is allocated We should consider data block allocation to trigger f2fs_balance_fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	25c1355151	f2fs: use writepages->lock for WB_SYNC_ALL If there are many writepages calls by multiple threads in background, we don't need to serialize to merge all the bios, since it's background. In such the case, it'd better to run writepages concurrently. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	b483fadf7e	f2fs: remove needless condition check This patch removes needless condition variable. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	0fd785eb93	f2fs: relocate is_merged_page Operations in is_merged_page is related to inner bio cache, move it to data.c. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Al Viro	5955102c99	wrappers for ->i_mutex access parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-01-22 18:04:28 -05:00
Jaegeuk Kim	d0239e1bf5	f2fs: detect idle time depending on user behavior This patch adds last time that user requested filesystem operations. This information is used to detect whether system is idle or not later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-11 15:56:37 -08:00
Chao Yu	da5af127a1	f2fs: recognize encrypted data in f2fs_fiemap This patch fixes to teach f2fs_fiemap to recognize encrypted data. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-08 11:51:58 -08:00
Jaegeuk Kim	2c4db1a6f6	f2fs: clean up f2fs_balance_fs This patch adds one parameter to clean up all the callers of f2fs_balance_fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-08 11:45:23 -08:00
Jaegeuk Kim	12719ae14e	f2fs: avoid unnecessary f2fs_balance_fs calls Only when node page is newly dirtied, it needs to check whether we need to do f2fs_gc. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-08 11:45:22 -08:00
Jaegeuk Kim	7612118ae8	f2fs: check the page status filled from disk After reading a page, we need to check whether there is any error. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-08 11:45:21 -08:00
Fan Li	de1475cc53	f2fs: read isize while holding i_mutex in fiemap make sure the isize we read doesn't change during the process. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-06 19:15:49 -08:00
Chao Yu	e0afc4d6d0	f2fs: introduce max_file_blocks in sbi Introduce max_file_blocks in sbi to store max block index of file in f2fs, it could be used to avoid unneeded calculation of max block index in runtime. Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: fix overflow of sbi->max_file_blocks] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-03 21:40:04 -08:00
Jaegeuk Kim	8d4ea29b64	f2fs: write pending bios when cp_error is set When testing ioc_shutdown, put_super is able to be hanged by waiting for writebacking pages as follows. INFO: task umount:2723 blocked for more than 120 seconds. Tainted: G O 4.4.0-rc3+ #8 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. umount D ffff88000859f9d8 0 2723 2110 0x00000000 ffff88000859f9d8 0000000000000000 0000000000000000 ffffffff81e11540 ffff880078c225c0 ffff8800085a0000 ffff88007fc17440 7fffffffffffffff ffffffff818239f0 ffff88000859fb48 ffff88000859f9f0 ffffffff8182310c Call Trace: [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff8182310c>] schedule+0x3c/0x90 [<ffffffff81827fb9>] schedule_timeout+0x2d9/0x430 [<ffffffff810e0f8f>] ? mark_held_locks+0x6f/0xa0 [<ffffffff8111614d>] ? ktime_get+0x7d/0x140 [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff8106a655>] ? kvm_clock_get_cycles+0x25/0x30 [<ffffffff8111617c>] ? ktime_get+0xac/0x140 [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff81822564>] io_schedule_timeout+0xa4/0x110 [<ffffffff81823a25>] bit_wait_io+0x35/0x50 [<ffffffff818235bd>] __wait_on_bit+0x5d/0x90 [<ffffffff811b9e8b>] wait_on_page_bit+0xcb/0xf0 [<ffffffff810d5f90>] ? autoremove_wake_function+0x40/0x40 [<ffffffff811cf84c>] truncate_inode_pages_range+0x4bc/0x840 [<ffffffff811cfc3d>] truncate_inode_pages_final+0x4d/0x60 [<ffffffffc023ced5>] f2fs_evict_inode+0x75/0x400 [f2fs] [<ffffffff812639bc>] evict+0xbc/0x190 [<ffffffff81263d19>] iput+0x229/0x2c0 [<ffffffffc0241885>] f2fs_put_super+0x105/0x1a0 [f2fs] [<ffffffff8124756a>] generic_shutdown_super+0x6a/0xf0 [<ffffffff812478f7>] kill_block_super+0x27/0x70 [<ffffffffc0241290>] kill_f2fs_super+0x20/0x30 [f2fs] [<ffffffff81247b03>] deactivate_locked_super+0x43/0x70 [<ffffffff81247f4c>] deactivate_super+0x5c/0x60 [<ffffffff81268d2f>] cleanup_mnt+0x3f/0x90 [<ffffffff81268dc2>] __cleanup_mnt+0x12/0x20 [<ffffffff810ac463>] task_work_run+0x73/0xa0 [<ffffffff810032ac>] exit_to_usermode_loop+0xcc/0xd0 [<ffffffff81003e7c>] syscall_return_slowpath+0xcc/0xe0 [<ffffffff81829ea2>] int_ret_from_sys_call+0x25/0x9f Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-31 13:08:02 -08:00
Jaegeuk Kim	819d9153d4	f2fs: use i_size_read to get i_size We need to use i_size_read() to get inode->i_size. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:19 -08:00
Yunlei He	179448bfe4	f2fs: add a max block check for get_data_block_bmap This patch adds a max block check for get_data_block_bmap. Trinity test program will send a block number as parameter into ioctl_fibmap, which will be used in get_node_path(), when the block number large than f2fs max blocks, it will trigger kernel bug. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com> [Jaegeuk Kim: fix missing condition, pointed by Chao Yu] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:17 -08:00
Fan Li	9a950d52b7	f2fs: fix bugs and simplify codes of f2fs_fiemap fix bugs: 1. len could be updated incorrectly when start+len is beyond isize. 2. If there is a hole consisting of more than two blocks, it could fail to add FIEMAP_EXTENT_LAST flag for the last extent. 3. If there is an extent beyond isize, when we search extents in a range that ends at isize, it will also return the extent beyond isize, which is outside the range. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:16 -08:00
Chao Yu	6d5a1495ee	f2fs: let user being aware of IO error Sometimes we keep dumb when IO error occur in lower layer device, so user will not receive any error return value for some operation, but actually, the operation did not succeed. This sould be avoided, so this patch reports such kind of error to user. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:15 -08:00
Jaegeuk Kim	b4d07a3e1a	f2fs: avoid f2fs_lock_op in f2fs_write_begin If f2fs_write_begin is to update data, we can bypass calling f2fs_lock_op() in order to avoid the checkpoint latency in the write syscall. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:13 -08:00
Jaegeuk Kim	2aadac085c	f2fs: introduce prepare_write_begin to clean up This patch adds prepare_write_begin to clean f2fs_write_begin. The major role of this function is to convert any inline_data and allocate or find block address. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:11 -08:00
Jaegeuk Kim	2a34076070	f2fs: call f2fs_balance_fs only when node was changed If user tries to update or read data, we don't need to call f2fs_balance_fs which triggers f2fs_gc, which increases unnecessary long latency. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:09 -08:00
Chao Yu	3104af35eb	f2fs: reduce covered region of sbi->cp_rwsem in f2fs_map_blocks Only cover sbi->cp_rwsem on one dnode page's allocation and modification instead of multiple's in f2fs_map_blocks, it can reduce the covered region of cp_rwsem, then we can avoid potential long time delay for concurrent checkpointer. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:08 -08:00
Jaegeuk Kim	93bae099ea	f2fs: record node block allocation in dnode_of_data This patch introduces recording node block allocation in dnode_of_data. This information helps to figure out whether any node block is allocated during specific file operations. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:07 -08:00
Jaegeuk Kim	b9d777b85f	f2fs: check inline_data flag at converting time We can check inode's inline_data flag when calling to convert it. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:05 -08:00
Fan Li	7df3a4318d	f2fs: optimize the flow of f2fs_map_blocks check map->m_len right after it changes to avoid excess call to update dnode_of_data. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-17 09:54:42 -08:00
Chao Yu	c227f91273	f2fs: record dirty status of regular/symlink inode Maintain regular/symlink inode which has dirty pages in global dirty list and record their total dirty pages count like the way of handling directory inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-16 08:58:12 -08:00
Chao Yu	9006f2c93f	f2fs: kill f2fs_drop_largest_extent For direct IO, f2fs only allocate new address for the block which is not exist in the disk before, its mapping info should not exist in extent cache previously, so here we do not need to call f2fs_drop_largest_extent to drop related cache. Due to no more callers for f2fs_drop_largest_extent now, kill it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-04 12:07:57 -08:00
Chao Yu	eb7e813cc7	f2fs: fix to remove directory inode from dirty list If last dirty dentry page was writebacked in reclaim path, we should remove its directory inode from global dirty list to avoid unnecessary flush for this inode when doing checkpoint. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-04 11:52:34 -08:00
Chao Yu	d323d005ac	f2fs: support file defragment This patch introduces a new ioctl F2FS_IOC_DEFRAGMENT to support file defragment in a specified range of regular file. This ioctl can be used in very limited workload: if user expects high sequential read performance in randomly written file, this interface can be used for defragmentation, after that file can be written as continuous as possible in the device. Meanwhile, it has side-effect, it will make holes in segments where blocks located originally, so it's better to trigger GC to eliminate fragment in segments. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-04 11:52:33 -08:00
Chao Yu	2da3e02746	f2fs: commit atomic written page in LFS mode We should always commit atomic written pages in LFS mode, otherwise data will become corrupted if we encounter suddent power cut after partial pages committed in IPU mode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-04 11:52:33 -08:00
Jaegeuk Kim	67f8cf3cee	f2fs: support fiemap for inline_data There is a FIEMAP_EXTENT_INLINE_DATA, pointed out by Marc. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-20 11:33:21 -07:00
Jaegeuk Kim	1d373a0ef7	f2fs: flush dirty data for bmap Users expect bmap will give allocated block addresses. Let's play likewise ext4. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-20 11:33:11 -07:00
Chao Yu	08b39fbd59	f2fs crypto: fix racing of accessing encrypted page among different competitors Since we use different page cache (normally inode's page cache for R/W and meta inode's page cache for GC) to cache the same physical block which is belong to an encrypted inode. Writeback of these two page cache should be exclusive, but now we didn't handle writeback state well, so there may be potential racing problem: a) kworker: f2fs_gc: - f2fs_write_data_pages - f2fs_write_data_page - do_write_data_page - write_data_page - f2fs_submit_page_mbio (page#1 in inode's page cache was queued in f2fs bio cache, and be ready to write to new blkaddr) - gc_data_segment - move_encrypted_block - pagecache_get_page (page#2 in meta inode's page cache was cached with the invalid datas of physical block located in new blkaddr) - f2fs_submit_page_mbio (page#1 was submitted, later, page#2 with invalid data will be submitted) b) f2fs_gc: - gc_data_segment - move_encrypted_block - f2fs_submit_page_mbio (page#1 in meta inode's page cache was queued in f2fs bio cache, and be ready to write to new blkaddr) user thread: - f2fs_write_begin - f2fs_submit_page_bio (we submit the request to block layer to update page#2 in inode's page cache with physical block located in new blkaddr, so here we may read gabbage data from new blkaddr since GC hasn't writebacked the page#1 yet) This patch fixes above potential racing problem for encrypted inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-13 09:52:34 -07:00
Chao Yu	b8c2940048	f2fs: add a tracepoint for f2fs_read_data_pages This patch adds a tracepoint for f2fs_read_data_pages to trace when pages are readahead by VFS. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 14:00:34 -07:00
Jaegeuk Kim	a56c7c6fb3	f2fs: set GFP_NOFS for grab_cache_page For normal inodes, their pages are allocated with __GFP_FS, which can cause filesystem calls when reclaiming memory. This can incur a dead lock condition accordingly. So, this patch addresses this problem by introducing f2fs_grab_cache_page(.., bool for_write), which calls grab_cache_page_write_begin() with AOP_FLAG_NOFS. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 13:38:03 -07:00
Jaegeuk Kim	a125702326	Revert "f2fs: do not skip dentry block writes" The periodic checkpoint can resolve the previous issue. So, now we can use this again to improve the reported performance regression: https://lkml.org/lkml/2015/10/8/20 This reverts commit 15bec0ff5a9ba6d203178fa8772259df6207942a.	2015-10-12 13:38:02 -07:00
Jaegeuk Kim	90b803e6fb	f2fs: do not skip dentry block writes Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory pressure and the number of dirty pages is pretty small. But, we didn't skip for normal data writes, which gives us not much big impact on overall performance. Moreover, by skipping some data writes, kworker falls into infinite loop to try to write blocks, when many dir inodes have only one dentry block. So, this patch removes skipping data writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:54 -07:00
Chao Yu	46c9e1413f	f2fs: use correct flag in f2fs_map_blocks() We introduce F2FS_GET_BLOCK_READ in commit `e2b4e2bc88` ("f2fs: fix incorrect mapping for bmap"), but forget to use this flag in the right place, fix it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:52 -07:00
Chao Yu	f9811703fe	f2fs: fix to handle io error in ->direct_IO Here is a oops reported as following message when testing generic/019 of xfstest: ------------[ cut here ]------------ kernel BUG at /home/yuchao/git/f2fs-dev/segment.c:882! invalid opcode: 0000 [#1] SMP Modules linked in: zram lz4_compress lz4_decompress f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_def CPU: 2 PID: 25441 Comm: fio Tainted: G O 4.3.0-rc1+ #6 Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013 task: ffff8803f4e85580 ti: ffff8803fd61c000 task.ti: ffff8803fd61c000 RIP: 0010:[<ffffffffa0784981>] [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs] RSP: 0018:ffff8803fd61f918 EFLAGS: 00010246 RAX: 00000000000007ed RBX: 0000000000000224 RCX: 000000000000001f RDX: 0000000000000800 RSI: ffffffffffffffff RDI: ffff8803f56f4300 RBP: ffff8803fd61f978 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000024 R11: ffff8800d23bbd78 R12: ffff8800d0ef0000 R13: 0000000000000224 R14: 0000000000000000 R15: 0000000000000001 FS: 00007f827ff85700(0000) GS:ffff88041ea80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffff600000 CR3: 00000003fef17000 CR4: 00000000001406e0 Stack: 000007ea00000002 0000000100000001 ffff8803f6456248 000007ed0000002b 0000000000000224 ffff880404d1aa20 ffff8803fd61f9c8 ffff8800d0ef0000 ffff8803f6456248 0000000000000001 00000000ffffffff ffffffffa078f358 Call Trace: [<ffffffffa0785b87>] allocate_segment_by_default+0x1a7/0x1f0 [f2fs] [<ffffffffa078322c>] allocate_data_block+0x17c/0x360 [f2fs] [<ffffffffa0779521>] __allocate_data_block+0x131/0x1d0 [f2fs] [<ffffffffa077a995>] f2fs_direct_IO+0x4b5/0x580 [f2fs] [<ffffffff811510ae>] generic_file_direct_write+0xae/0x160 [<ffffffff811518f5>] __generic_file_write_iter+0xd5/0x1f0 [<ffffffff81151e07>] generic_file_write_iter+0xf7/0x200 [<ffffffff81319e38>] ? apparmor_file_permission+0x18/0x20 [<ffffffffa0768480>] ? f2fs_fallocate+0x1190/0x1190 [f2fs] [<ffffffffa07684c6>] f2fs_file_write_iter+0x46/0x90 [f2fs] [<ffffffff8120b4fe>] aio_run_iocb+0x1ee/0x290 [<ffffffff81700f7e>] ? mutex_lock+0x1e/0x50 [<ffffffff8120a1d7>] ? aio_read_events+0x207/0x2b0 [<ffffffff8120b913>] do_io_submit+0x373/0x630 [<ffffffff8120a4f6>] ? SyS_io_getevents+0x56/0xb0 [<ffffffff8120bbe0>] SyS_io_submit+0x10/0x20 [<ffffffff81703857>] entry_SYSCALL_64_fastpath+0x12/0x6a Code: 45 c8 48 8b 78 10 e8 9f 23 bf e0 41 8b 8c 24 cc 03 00 00 89 c7 31 d2 89 c6 89 d8 29 df f7 f1 29 d1 39 cf 0f 83 be fd ff ff eb RIP [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs] RSP <ffff8803fd61f918> ---[ end trace 2e577d7f711ddb86 ]--- The reason is that: in the test of generic/019, we will trigger a manmade IO error in block layer through debugfs, after that, prefree segment will no longer be freed, because we always skip doing gc or checkpoint when there occurs an IO error. Meanwhile fio with aio engine generated a large number of direct IOs, which continue allocating spaces in free segment until we run out of them, eventually, results in panic in new_curseg as no more free segment was found. So, this patch changes to return EIO in direct_IO for this condition. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:52 -07:00
Chao Yu	973163fc0c	f2fs: reorganize f2fs_map_blocks In this patch, we try to reorganize f2fs_map_blocks to make block mapping flow more clear by using following structure: /* check status of mapping / if (unmapped) { / blkaddr == NULL_ADDR \|\| blkaddr == NEW_ADDR / if (create) { / write path, handle dio write case here / alloc_and_map; } else { / * handle read cases from all call paths: * 1. generic read; * 2. dio read; * 3. fiemap; * 4. bmap / } } / map buffer_header */ Besides, this patch handles the missing case correctly for dio write: When we fail in __allocate_data_blocks, then in f2fs_map_blocks, we will not allocate blocks correctly for preallocated blocks, but returning with an unmapped buffer head, which will result in failure of dio write. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:51 -07:00
Chao Yu	9edcdabf36	f2fs: fix overflow of size calculation We have potential overflow issue when calculating size of object, when we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only 32-bits space in 32-bit architecture, left shifting will incur overflow, i.e: pgoff_t index = 0xFFFFFFFF; loff_t size = index << PAGE_CACHE_SHIFT; size: 0xFFFFF000 So we should cast index with 64-bits type to avoid this issue. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:50 -07:00
Linus Torvalds	4c12ab7e5e	Merge tag 'for-f2fs-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "The major work includes fixing and enhancing the existing extent_cache feature, which has been well settling down so far and now it becomes a default mount option accordingly. Also, this version newly registers a f2fs memory shrinker to reclaim several objects consumed by a couple of data structures in order to avoid memory pressures. Another new feature is to add ioctl(F2FS_GARBAGE_COLLECT) which triggers a cleaning job explicitly by users. Most of the other patches are to fix bugs occurred in the corner cases across the whole code area" * tag 'for-f2fs-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (85 commits) f2fs: upset segment_info repair f2fs: avoid accessing NULL pointer in f2fs_drop_largest_extent f2fs: update extent tree in batches f2fs: fix to release inode correctly f2fs: handle f2fs_truncate error correctly f2fs: avoid unneeded initializing when converting inline dentry f2fs: atomically set inode->i_flags f2fs: fix wrong pointer access during try_to_free_nids f2fs: use __GFP_NOFAIL to avoid infinite loop f2fs: lookup neighbor extent nodes for merging later f2fs: split __insert_extent_tree_ret for readability f2fs: kill dead code in __insert_extent_tree f2fs: adjust showing of extent cache stat f2fs: add largest/cached stat in extent cache f2fs: fix incorrect mapping for bmap f2fs: add annotation for space utilization of regular/inline dentry f2fs: fix to update cached_en of extent tree properly f2fs: fix typo f2fs: check the node block address of newly allocated nid f2fs: go out for insert_inode_locked failure ...	2015-09-03 13:10:22 -07:00
Linus Torvalds	1081230b74	Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-block Pull core block updates from Jens Axboe: "This first core part of the block IO changes contains: - Cleanup of the bio IO error signaling from Christoph. We used to rely on the uptodate bit and passing around of an error, now we store the error in the bio itself. - Improvement of the above from myself, by shrinking the bio size down again to fit in two cachelines on x86-64. - Revert of the max_hw_sectors cap removal from a revision again, from Jeff Moyer. This caused performance regressions in various tests. Reinstate the limit, bump it to a more reasonable size instead. - Make /sys/block/<dev>/queue/discard_max_bytes writeable, by me. Most devices have huge trim limits, which can cause nasty latencies when deleting files. Enable the admin to configure the size down. We will look into having a more sane default instead of UINT_MAX sectors. - Improvement of the SGP gaps logic from Keith Busch. - Enable the block core to handle arbitrarily sized bios, which enables a nice simplification of bio_add_page() (which is an IO hot path). From Kent. - Improvements to the partition io stats accounting, making it faster. From Ming Lei. - Also from Ming Lei, a basic fixup for overflow of the sysfs pending file in blk-mq, as well as a fix for a blk-mq timeout race condition. - Ming Lin has been carrying Kents above mentioned patches forward for a while, and testing them. Ming also did a few fixes around that. - Sasha Levin found and fixed a use-after-free problem introduced by the bio->bi_error changes from Christoph. - Small blk cgroup cleanup from Viresh Kumar" * 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits) blk: Fix bio_io_vec index when checking bvec gaps block: Replace SG_GAPS with new queue limits mask block: bump BLK_DEF_MAX_SECTORS to 2560 Revert "block: remove artifical max_hw_sectors cap" blk-mq: fix race between timeout and freeing request blk-mq: fix buffer overflow when reading sysfs file of 'pending' Documentation: update notes in biovecs about arbitrarily sized bios block: remove bio_get_nr_vecs() fs: use helper bio_add_page() instead of open coding on bi_io_vec block: kill merge_bvec_fn() completely md/raid5: get rid of bio_fits_rdev() md/raid5: split bio for chunk_aligned_read block: remove split code in blkdev_issue_{discard,write_same} btrfs: remove bio splitting and merge_bvec_fn() calls bcache: remove driver private bio splitting code block: simplify bio_add_page() block: make generic_make_request handle arbitrarily sized bios blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL) block: don't access bio->bi_error after bio_put() block: shrink struct bio down to 2 cache lines again ...	2015-09-02 13:10:25 -07:00
Chao Yu	e2b4e2bc88	f2fs: fix incorrect mapping for bmap The test step is like below: 1. touch file 2. truncate -s $((10241024)) file 3. fallocate -o 0 -l $((10241024)) file 4. fibmap.f2fs file Our result of fibmap.f2fs showed below is not correct: file_pos start_blk end_blk blks 0 -937166132 -937166132 1 4096 -937166132 -937166132 1 8192 -937166132 -937166132 1 12288 -937166132 -937166132 1 16384 -937166132 -937166132 1 20480 -937166132 -937166132 1 ... 1040384 -937166132 -937166132 1 1044480 -937166132 -937166132 1 This is because f2fs_map_blocks will return with no error when meeting a hole or preallocated block, the caller __get_data_block will map the uninitialized variable value to bh->b_blocknr. Unfortunately generic_block_bmap will neither check the return value of get_data() nor check mapping info of buffer_head, result in returning the random block address. After fixing the issue, our result shows correctly: file_pos start_blk end_blk blks 0 0 0 256 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-21 22:45:14 -07:00
Jaegeuk Kim	740432f835	f2fs: handle failed bio allocation As the below comment of bio_alloc_bioset, f2fs can allocate multiple bios at the same time. So, we can't guarantee that bio is allocated all the time. " * When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be * able to allocate a bio. This is due to the mempool guarantees. To make this * work, callers must never allocate more than 1 bio at a time from this pool. * Callers that need to allocate more than 1 bio must always submit the * previously allocated bio for IO before attempting to allocate a new one. * Failure to do so can cause deadlocks under memory pressure. " Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-20 09:00:09 -07:00
Kent Overstreet	b54ffb73ca	block: remove bio_get_nr_vecs() We can always fill up the bio now, no need to estimate the possible size based on queue parameters. Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [hch: rebased and wrote a changelog] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-13 12:32:04 -06:00
Chao Yu	decd36b6c4	f2fs: remove inmem radix tree Previously, we use radix tree to index all registered page entries for atomic file, but now we only use radix tree to see whether current page is indexed or not, since the other user of radix tree is gone in commit `042b7816aa` ("f2fs: remove unnecessary call to invalidate inmemory pages"). So in this patch, we try to use one more efficient way: Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private value to indicate page indexing status. By using this way, we can save memory and lookup time. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-11 11:31:14 -07:00
Chao Yu	c15e8599ff	f2fs: report EINVAL for unalignment direct IO We run ltp testcase with f2fs and obtain a TFAIL in diotest4, the result in detail is as fallow: dio04 <<<test_start>>> tag=dio04 stime=1432278894 cmdline="diotest4" contacts="" analysis=exit <<<test_output>>> diotest4 1 TPASS : Negative Offset diotest4 2 TPASS : removed diotest4 3 TFAIL : diotest4.c:129: write allows odd count.returns 1: Success diotest4 4 TFAIL : diotest4.c:183: Odd count of read and write diotest4 5 TPASS : Read beyond the file size ...... the result of ext4 with same environment: dio04 <<<test_start>>> tag=dio04 stime=1432259643 cmdline="diotest4" contacts="" analysis=exit <<<test_output>>> diotest4 1 TPASS : Negative Offset diotest4 2 TPASS : removed diotest4 3 TPASS : Odd count of read and write diotest4 4 TPASS : Read beyond the file size ...... The reason is that when triggering DIO in f2fs, we will return zero value in ->direct_IO if writer's buffer offset, file offset and transfer size is not alignment to block size of filesystem, resulting in falling back into buffered write instead of returning -EINVAL. This patch fixes that problem by returning correct error number for above case, and removing the judgement condition in check_direct_IO to make sure the verification will be enabled for direct reader too. Besides, Jaegeuk Kim pointed out that there is expectional cases we should always make direct-io falling back into buffered write, such as dio in encrypted file. Signed-off-by: Yunlei He <heyunlei@huawei.com> [Chao Yu make small change and add detail description in commit message] Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-11 11:30:24 -07:00
Fan Li	759af1c9c1	f2fs: use extent cache to optimize f2fs_reserve_block In some cases, we only need the block address when we call f2fs_reserve_block, other fields of struct dnode_of_data aren't necessary. We can try extent cache first for such cases in order to speed up the process. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 22:15:42 -07:00
Chao Yu	470f00e968	f2fs: fix to release inode page correctly In following call path, we will pass a locked and referenced ipage pointer to get_new_data_page: - init_inode_metadata - make_empty_dir - get_new_data_page There are two exit paths in get_new_data_page when error occurs: 1) grab_cache_page fails, ipage will not be released; 2) f2fs_reserve_block fails, ipage will be released in callee. So, it's not consistent for error handling in get_new_data_page. For f2fs_reserve_block, it's not very easy to change the rule of error handling, since it's already complicated. Here we deside to choose an easy way to fix this issue: If any error occur in get_new_data_page, we will ensure releasing ipage in this function. The same issue is in f2fs_convert_inline_dir, fix that too. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:08:23 -07:00
Fan Li	5768dcdd7f	f2fs: change the timing of f2fs_wait_on_page_writeback some backing devices need pages to be stable during writeback. It doesn't matter if the page is completely overwritten or already uptodate, it needs to wait before write. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:08:16 -07:00
Chao Yu	6a2905443c	f2fs: skip writing in ->writepages when no dirty pages exist When flushing comes from background, if there is no dirty page in the mapping of inode, we'd better to skip seeking dirty page from mapping for writebacking. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:08:11 -07:00
Tiezhu Yang	737f18992e	f2fs: optimize f2fs_write_cache_pages The if statement "goto continue_unlock" is exactly the same when each if condition is true that is depended on the value of both "step" and "is_cold_data(page)" are 0 or 1. That means when the value of "step" equals to "is_cold_data(page)", the if condition is true and the if statement "goto continue_unlock" appears only once, so it can be optimized to reduce the duplicated code. Signed-off-by: Tiezhu Yang <kernelpatch@126.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:08:10 -07:00
Jaegeuk Kim	86531d6b84	f2fs: callers take care of the page from bio error This patch changes for a caller to handle the page after its bio gets an error. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:08:07 -07:00
Chao Yu	8f46dcaea8	f2fs: expose f2fs_write_cache_pages If there are gced dirty pages and normal dirty pages in the mapping of one inode, we might writeback them alternately with discontinuous block address, resulting in low performance. This patch introduces f2fs_write_cache_pages with codes copied from write_cache_pages in mm/page-writeback.c. In this function, we refactor flow with two steps: 1) writeback all cold type pages. 2) writeback all non-cold type pages. By using this method, f2fs will writeback dirty pages with the same temperature in bunch mode, it makes writeouted block being with more continuous address, so they can be merged as much as possible in f2fs bio cache, and also it will reduce the chance of submiting small IO from block layer. Test environment: 8g nokia sd card (very old sd card, but it shows better effect when testing with this patch, and with a 32g kingston sd card, I didn't see much more improvement). Test step: 1. touch testfile; 2. truncate -s 512K testfile; 3. write all pages with odd index; 4. trigger gc by ioctl; 5. write all pages with even index; 6. time fsync testfile. before: real 0m0.402s user 0m0.000s sys 0m0.000s after: real 0m0.143s user 0m0.004s sys 0m0.004s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:59 -07:00
Chao Yu	a28ef1f5ae	f2fs: maintain extent cache in separated file This patch moves extent cache related code from data.c into extent_cache.c since extent cache is independent feature, and its codes are not relate to others in data.c, it's better for us to maintain them in separated place. There is no functionality change, but several small coding style fixes including: * rename __drop_largest_extent to f2fs_drop_largest_extent for exporting; * rename misspelled word 'untill' to 'until'; * remove unneeded 'return' in the end of f2fs_destroy_extent_tree(). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:58 -07:00
Fan Li	3c7df87dad	f2fs: don't try to split extents shorter than F2FS_MIN_EXTENT_LEN Since only parts of extents longer than F2FS_MIN_EXTENT_LEN will be kept in extent cache after split, extents already shorter than F2FS_MIN_EXTENT_LEN don't need to try split at all. Signed-off-by: Fan Li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:58 -07:00
Chao Yu	90d4388ac2	f2fs: fix to update page flag This patch fixes to update page flag (e.g. Uptodate/cold flag) in ->write_begin. Otherwise, page will be non-uptodate when we try to write entire page, and cold data flag in page will not be clean when gced page is being rewritten. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:57 -07:00
Jaegeuk Kim	7023a1ad17	f2fs: shrink unreferenced extent_caches first If an extent_tree entry has a zero reference count, we can drop it from the cache in higher priority rather than currently referencing entries. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:57 -07:00
Chao Yu	bb96a8d51e	f2fs: enhance multithread performance In ->writepages, we use writepages mutex lock to serialize all block address allocation and page submitting pairs from different inodes. This method makes our delayed dirty pages of one inode being written continously as many as possible. But there is one problem that we did not submit current cached bio in protection region of writepages mutex lock, so there is a small chance that we submit the one of other thread's as below, resulting in splitting more bios. thread 1 thread 2 ->writepages lock(writepages) ->write_cache_pages unlock(writepages) lock(writepages) ->write_cache_pages ->f2fs_submit_merged_bio ->writepage unlock(writepages) fs_mark-6535 [002] .... 2242.270230: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5766152, size = 524288 fs_mark-6536 [000] .... 2242.270361: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767176, size = 4096 fs_mark-6536 [000] .... 2242.270370: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, NODE, sector = 8138112, size = 4096 fs_mark-6535 [002] .... 2242.270776: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767184, size = 516096 This may really increase time of block layer works, and may cause larger IO lantency. This patch moves the submitting operation into region of writepages mutex lock to avoid bio splits when concurrently writebacking is intensive. my test environment: virtual machine, intel cpu i5 2500, 8GB size memory, 4GB size ramdisk time fs_mark -t 16 -L 1 -s 524288 -S 1 -d /mnt/f2fs/ before: real 0m4.244s user 0m0.088s sys 0m12.336s after: real 0m3.822s user 0m0.072s sys 0m10.760s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:57 -07:00
Jaegeuk Kim	84bc926c07	f2fs: check the largest extent at look-up time Because of the extent shrinker or other -ENOMEM scenarios, it cannot guarantee that the largest extent would be cached in the tree all the time. Instead of relying on extent_tree, we can simply check the cached one in extent tree accordingly. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:56 -07:00
Jaegeuk Kim	3e72f72139	f2fs: use extent_cache by default We don't need to handle the duplicate extent information. The integrated rule is: - update on-disk extent with largest one tracked by in-memory extent_cache - destroy extent_tree for the truncation case - drop per-inode extent_cache by shrinker Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:56 -07:00
Jaegeuk Kim	554df79e52	f2fs: shrink extent_cache entries This patch registers shrinking extent_caches. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:55 -07:00
Jaegeuk Kim	244f4fc1c5	f2fs: set cached_en after checking finally This patch relocates cached_en not only to be covered by spin_lock, but also to set once after checking out completely. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:54 -07:00
Jaegeuk Kim	cbe91923a9	f2fs: update on-disk extents even under extent_cache Previously, f2fs_update_extent_cache() updates in-memory extent_cache all the time, and then finally preserves its up-to-date extent into on-disk one during f2fs_evict_inode. But, in the following scenario: 1. mount 2. open & write an extent X 3. f2fs_evict_inode; on-disk extent is X 4. open & update the extent X with Y 5. sync; trigger checkpoint 6. power-cut after power-on, f2fs should serve extent Y, but we have an on-disk extent X. This causes a failure on xfstests/311. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:54 -07:00
Jaegeuk Kim	7a2cb67867	f2fs: fix wrong block address calculation for a split extent This patch fixes wrong calculation on block address field when an extent is split. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:54 -07:00
Christoph Hellwig	4246a0b63b	block: add a bi_error field to struct bio Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-07-29 08:55:15 -06:00
Jaegeuk Kim	6282adbf93	f2fs: call set_page_dirty to attach i_wb for cgroup The cgroup attaches inode->i_wb via mark_inode_dirty and when set_page_writeback is called, __inc_wb_stat() updates i_wb's stat. So, we need to explicitly call set_page_dirty->__mark_inode_dirty in prior to any writebacking pages. This patch should resolve the following kernel panic reported by Andreas Reis. https://bugzilla.kernel.org/show_bug.cgi?id=101801 --- Comment #2 from Andreas Reis <andreas.reis@gmail.com> --- BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 IP: [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90 PGD 2951ff067 PUD 2df43f067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 7 PID: 10356 Comm: gcc Tainted: G W 4.2.0-1-cu #1 Hardware name: Gigabyte Technology Co., Ltd. G1.Sniper M5/G1.Sniper M5, BIOS T01 02/03/2015 task: ffff880295044f80 ti: ffff880295140000 task.ti: ffff880295140000 RIP: 0010:[<ffffffff8149deea>] [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90 RSP: 0018:ffff880295143ac8 EFLAGS: 00010082 RAX: 0000000000000003 RBX: ffffea000a526d40 RCX: 0000000000000001 RDX: 0000000000000020 RSI: 0000000000000001 RDI: 0000000000000088 RBP: ffff880295143ae8 R08: 0000000000000000 R09: ffff88008f69bb30 R10: 00000000fffffffa R11: 0000000000000000 R12: 0000000000000088 R13: 0000000000000001 R14: ffff88041d099000 R15: ffff880084a205d0 FS: 00007f8549374700(0000) GS:ffff88042f3c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a8 CR3: 000000033e1d5000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: 0000000000000000 ffffea000a526d40 ffff880084a20738 ffff880084a20750 ffff880295143b48 ffffffff811cc91e ffff880000000000 0000000000000296 0000000000000000 ffff880417090198 0000000000000000 ffffea000a526d40 Call Trace: [<ffffffff811cc91e>] __test_set_page_writeback+0xde/0x1d0 [<ffffffff813fee87>] do_write_data_page+0xe7/0x3a0 [<ffffffff813faeea>] gc_data_segment+0x5aa/0x640 [<ffffffff813fb0b8>] do_garbage_collect+0x138/0x150 [<ffffffff813fb3fe>] f2fs_gc+0x1be/0x3e0 [<ffffffff81405541>] f2fs_balance_fs+0x81/0x90 [<ffffffff813ee357>] f2fs_unlink+0x47/0x1d0 [<ffffffff81239329>] vfs_unlink+0x109/0x1b0 [<ffffffff8123e3d7>] do_unlinkat+0x287/0x2c0 [<ffffffff8123ebc6>] SyS_unlink+0x16/0x20 [<ffffffff81942e2e>] entry_SYSCALL_64_fastpath+0x12/0x71 Code: 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 08 65 ff 05 e6 d9 b6 7e <48> 8b 47 20 48 63 ca 65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a RIP [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90 RSP <ffff880295143ac8> CR2: 00000000000000a8 ---[ end trace 5132449a58ed93a3 ]--- note: gcc[10356] exited with preempt_count 2 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-07-25 08:54:26 -07:00
Chao Yu	1237702471	f2fs: avoid duplicated code by reusing f2fs_read_end_io This patch tries to clean up code because part code of f2fs_read_end_io and mpage_end_io are the same, so it's better to merge and reuse them. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-06-01 16:21:04 -07:00
Jaegeuk Kim	4375a33664	f2fs crypto: add encryption support in read/write paths This patch adds encryption support in read and write paths. Note that, in f2fs, we need to consider cleaning operation. In cleaning procedure, we must avoid encrypting and decrypting written blocks. So, this patch implements move_encrypted_block(). Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:52 -07:00
Jaegeuk Kim	fcc85a4d86	f2fs crypto: activate encryption support for fs APIs This patch activates the following APIs for encryption support. The rules quoted by ext4 are: - An unencrypted directory may contain encrypted or unencrypted files or directories. - All files or directories in a directory must be protected using the same key as their containing directory. - Encrypted inode for regular file should not have inline_data. - Encrypted symlink and directory may have inline_data and inline_dentry. This patch activates the following APIs. 1. f2fs_link : validate context 2. f2fs_lookup : '' 3. f2fs_rename : '' 4. f2fs_create/f2fs_mkdir : inherit its dir's context 5. f2fs_direct_IO : do buffered io for regular files 6. f2fs_open : check encryption info 7. f2fs_file_mmap : '' 8. f2fs_setattr : '' 9. f2fs_file_write_iter : '' (Called by sys_io_submit) 10. f2fs_fallocate : do not support fcollapse 11. f2fs_evict_inode : free_encryption_info Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:51 -07:00
Jaegeuk Kim	7f63eb77af	f2fs: report unwritten area in f2fs_fiemap This patch slightly changes f2fs_fiemap function to report unwritten area. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:45 -07:00
Jaegeuk Kim	43f3eae1d3	f2fs: split find_data_page according to specific purposes This patch splits find_data_page as follows. 1. f2fs_gc - use get_read_data_page() with read only 2. find_in_level - use find_data_page without locked page 3. truncate_partial_page - In the case cache_only mode, just drop cached page. - Ohterwise, use get_lock_data_page() and guarantee to truncate Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:37 -07:00
Jaegeuk Kim	01f28610a1	f2fs: fix race on allocating and deallocating a dentry block There are two threads: f2fs_delete_entry() get_new_data_page() f2fs_reserve_block() dn.blkaddr = XXX lock_page(dentry_block) truncate_hole() dn.blkaddr = NULL unlock_page(dentry_block) lock_page(dentry_block) fill the block from XXX address add new dentries unlock_page(dentry_block) Later, f2fs_write_data_page() will truncate the dentry_block, since its block address is NULL. The reason for this was due to the wrong lock order. In this case, we should do f2fs_reserve_block() after locking its dentry block. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:35 -07:00
Jaegeuk Kim	05ca3632e5	f2fs: add sbi and page pointer in f2fs_io_info This patch adds f2fs_sb_info and page pointers in f2fs_io_info structure. With this change, we can reduce a lot of parameters for IO functions. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:32 -07:00
Jaegeuk Kim	f1e8866016	f2fs: expose f2fs_mpage_readpages This patch implements f2fs_mpage_readpages for further optimization on encryption support. The basic code was taken from fs/mpage.c, and changed to be simple by adjusting that block_size is equal to page_size in f2fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:30 -07:00
Jaegeuk Kim	003a3e1d60	f2fs: add f2fs_map_blocks This patch introduces f2fs_map_blocks structure likewise ext4_map_blocks. Now, f2fs uses f2fs_map_blocks when handling get_block. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:29 -07:00
Jaegeuk Kim	5463e7c18e	Revert "f2fs: enhance multi-threads performance" This reports performance regression by Yuanhan Liu. The basic idea was to reduce one-point mutex, but it turns out this causes another contention like context swithes. https://lkml.org/lkml/2015/4/21/11 Until finishing the analysis on this issue, I'd like to revert this for a while. This reverts commit `78373b7319`.	2015-05-04 14:15:15 -07:00
Linus Torvalds	06a60deca8	Merge tag 'for-f2fs-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "New features: - in-memory extent_cache - fs_shutdown to test power-off-recovery - use inline_data to store symlink path - show f2fs as a non-misc filesystem Major fixes: - avoid CPU stalls on sync_dirty_dir_inodes - fix some power-off-recovery procedure - fix handling of broken symlink correctly - fix missing dot and dotdot made by sudden power cuts - handle wrong data index during roll-forward recovery - preallocate data blocks for direct_io ... and a bunch of minor bug fixes and cleanups" * tag 'for-f2fs-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (71 commits) f2fs: pass checkpoint reason on roll-forward recovery f2fs: avoid abnormal behavior on broken symlink f2fs: flush symlink path to avoid broken symlink after POR f2fs: change 0 to false for bool type f2fs: do not recover wrong data index f2fs: do not increase link count during recovery f2fs: assign parent's i_mode for empty dir f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries f2fs: fix mismatching lock and unlock pages for roll-forward recovery f2fs: fix sparse warnings f2fs: limit b_size of mapped bh in f2fs_map_bh f2fs: persist system.advise into on-disk inode f2fs: avoid NULL pointer dereference in f2fs_xattr_advise_get f2fs: preallocate fallocated blocks for direct IO f2fs: enable inline data by default f2fs: preserve extent info for extent cache f2fs: initialize extent tree with on-disk extent info of inode f2fs: introduce __{find,grab}_extent_tree f2fs: split set_data_blkaddr from f2fs_update_extent_cache f2fs: enable fast symlink by utilizing inline data ...	2015-04-18 11:17:20 -04:00
Omar Sandoval	22c6186ece	direct_IO: remove rw from a_ops->direct_IO() Now that no one is using rw, remove it completely. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:29:45 -04:00
Omar Sandoval	6f67376318	direct_IO: use iov_iter_rw() instead of rw everywhere The rw parameter to direct_IO is redundant with iov_iter->type, and treated slightly differently just about everywhere it's used: some users do rw & WRITE, and others do rw == WRITE where they should be doing a bitwise check. Simplify this with the new iov_iter_rw() helper, which always returns either READ or WRITE. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:29:45 -04:00
Omar Sandoval	17f8c842d2	Remove rw from {,__,do_}blockdev_direct_IO() Most filesystems call through to these at some point, so we'll start here. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:29:44 -04:00
Chao Yu	1b3e27a92a	f2fs: limit b_size of mapped bh in f2fs_map_bh Map bh over max size which caller defined is not needed, limit it in f2fs_map_bh. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:55 -07:00
Chao Yu	df6136ef55	f2fs: preallocate fallocated blocks for direct IO Normally, due to DIO_SKIP_HOLES flag is set by default, blockdev_direct_IO in f2fs_direct_IO tries to skip DIO in holes when writing inside i_size, this makes us falling back to buffered IO which shows lower performance. So in commit `59b802e5a4` ("f2fs: allocate data blocks in advance for f2fs_direct_IO"), we improve perfromance by allocating data blocks in advance if we meet holes no matter in i_size or not, since with it we can avoid falling back to buffered IO. But we forget to consider for unwritten fallocated block in this commit. This patch tries to fix it for fallocate case, this helps to improve performance. Test result: Storage info: sandisk ultra 64G micro sd card. touch /mnt/f2fs/file truncate -s 67108864 /mnt/f2fs/file fallocate -o 0 -l 67108864 /mnt/f2fs/file time dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=64 conv=notrunc oflag=direct Time before applying the patch: 67108864 bytes (67 MB) copied, 36.16 s, 1.9 MB/s real 0m36.162s user 0m0.000s sys 0m0.180s Time after applying the patch: 67108864 bytes (67 MB) copied, 27.7776 s, 2.4 MB/s real 0m27.780s user 0m0.000s sys 0m0.036s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:52 -07:00
Chao Yu	0bdee48250	f2fs: preserve extent info for extent cache This patch tries to preserve last extent info in extent tree cache into on-disk inode, so this can help us to reuse the last extent info next time for performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:51 -07:00
Chao Yu	028a41e893	f2fs: initialize extent tree with on-disk extent info of inode With normal extent info cache, we records largest extent mapping between logical block and physical block into extent info, and we persist extent info in on-disk inode. When we enable extent tree cache, if extent info of on-disk inode is exist, and the extent is not a small fragmented mapping extent. We'd better to load the extent info into extent tree cache when inode is loaded. By this way we can have more chance to hit extent tree cache rather than taking more time to read dnode page for block address. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:50 -07:00
Chao Yu	93dfc52656	f2fs: introduce __{find,grab}_extent_tree This patch introduces __{find,grab}_extent_tree for reusing by following patches. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:49 -07:00

1 2 3 4 5 ...

319 Commits