Commit Graph

987 Commits

Author SHA1 Message Date
Jaegeuk Kim 8fbc418f99 f2fs: avoid wrong error during recovery
During the roll-forward recovery, -ENOENT for f2fs_iget can be skipped.
So, this error value should not be propagated.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:48 -08:00
Jaegeuk Kim 1614091dc1 f2fs: remove obsolete code
This patch removes obsolete code in which summary variable is not needed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:48 -08:00
Chao Yu cb3bc9ee06 f2fs: use extent cache for dir
We update extent cache for all user inode of f2fs including dir inode, so this
patch gives another chance to try to get physical address of page from extent
cache for dir inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:48 -08:00
Chao Yu 91c5d9bce7 f2fs: switch to check FI_NO_EXTENT in f2fs_{lookup,update}_extent_cache
This patch switch to check FI_NO_EXTENT in f2fs_{lookup,update}_extent_cache
instead of f2fs_{lookup,update}_extent_tree or {lookup,update}_extent_info.

No functionality modification in this patch.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:48 -08:00
Chao Yu 62c8af651b f2fs: support fast lookup in extent cache
This patch adds a fast lookup path for rb-tree extent cache.

In this patch we add a recently accessed extent node pointer 'cached_en' in
extent tree. In lookup path of extent cache, we will firstly lookup the last
accessed extent node which cached_en points, if we do not hit in this node,
we will try to lookup extent node in rb-tree.

By this way we can avoid unnecessary slow lookup in rb-tree sometimes.

Note that, side-effect of this patch is that we will increase memory cost,
because we will store a pointer variable in each struct extent tree
additionally.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:47 -08:00
Chao Yu 1ec4610c52 f2fs: add trace for rb-tree extent cache ops
This patch adds trace for lookup/update/shrink/destroy ops in rb-tree extent cache.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:47 -08:00
Chao Yu 4bf6fd9fed f2fs: show extent tree, node stat info in debugfs
This patch add and show stat info of total memory footprint for extent tree,node
in debugfs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:47 -08:00
Chao Yu 1dcc336b02 f2fs: enable rb-tree extent cache
This patch enables rb-tree based extent cache in f2fs.

When we mount with "-o extent_cache", f2fs will try to add recently accessed
page-block mappings into rb-tree based extent cache as much as possible, instead
of original one extent info cache.

By this way, f2fs can support more effective cache between dnode page cache and
disk. It will supply high hit ratio in the cache with fewer memory when dnode
page cache are reclaimed in environment of low memory.

Storage: Sandisk sd card 64g
1.append write file (offset: 0, size: 128M);
2.override write file (offset: 2M, size: 1M);
3.override write file (offset: 4M, size: 1M);
...
4.override write file (offset: 48M, size: 1M);
...
5.override write file (offset: 112M, size: 1M);
6.sync
7.echo 3 > /proc/sys/vm/drop_caches
8.read file (size:128M, unit: 4k, count: 32768)
(time dd if=/mnt/f2fs/128m bs=4k count=32768)

Extent Hit Ratio:
		before		patched
Hit Ratio	121 / 1071	1071 / 1071

Performance:
		before		patched
real    	0m37.051s	0m35.556s
user    	0m0.040s	0m0.026s
sys     	0m2.990s	0m2.251s

Memory Cost:
		before		patched
Tree Count:	0		1 (size: 24 bytes)
Node Count:	0		45 (size: 1440 bytes)

v3:
 o retest and given more details of test result.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:47 -08:00
Chao Yu 8967215954 f2fs: add a mount option for rb-tree extent cache
This patch adds a mount option 'extent_cache' in f2fs.

It is try to use a rb-tree based extent cache to cache more mapping information
with less memory if this option is set, otherwise we will use the original one
extent info cache.

Suggested-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:46 -08:00
Chao Yu 429511cdf8 f2fs: add core functions for rb-tree extent cache
This patch adds core functions including slab cache init function and
init/lookup/update/shrink/destroy function for rb-tree based extent cache.

Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about detail
design and implementation of extent cache.

Todo:
 * register rb-based extent cache shrink with mm shrink interface.

v2:
 o move set_extent_info and __is_{extent,back,front}_mergeable into f2fs.h.
 o introduce __{attach,detach}_extent_node for code readability.
 o add cond_resched() when fail to invoke kmem_cache_alloc/radix_tree_insert.
 o fix some coding style and typo issues.

v3:
 o fix oops due to using an unassigned pointer.
 o use list_del to remove extent node in shrink list.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: add static for some funcitons and declare in f2fs.h]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:46 -08:00
Chao Yu 13054c548a f2fs: introduce infra macro and data structure of rb-tree extent cache
Introduce infra macro and data structure for rb-tree based extent cache:

Macros:
 * EXT_TREE_VEC_SIZE: indicate vector size for gang lookup in extent tree.
 * F2FS_MIN_EXTENT_LEN: indicate minimum length of extent managed in cache.
 * EXTENT_CACHE_SHRINK_NUMBER: indicate number of extent in cache will be shrunk.

Basic data structures for extent cache:
 * struct extent_tree: extent tree entry per inode.
 * struct extent_node: extent info node linked in extent tree.

Besides, adding new extent cache related fields in f2fs_sb_info.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:46 -08:00
Chao Yu 7e4dde79df f2fs: introduce universal lookup/update interface for extent cache
In this patch, we do these jobs:
1. rename {check,update}_extent_cache to {lookup,update}_extent_info;
2. introduce universal lookup/update interface of extent cache:
f2fs_{lookup,update}_extent_cache including above two real functions, then
export them to function callers.

So after above cleanup, we can add new rb-tree based extent cache into exported
interfaces.

v2:
 o remove "f2fs_" for inner function {lookup,update}_extent_info suggested by
   Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:46 -08:00
Chao Yu a2e7d1bfeb f2fs: introduce f2fs_map_bh to clean codes of check_extent_cache
This patch introduces f2fs_map_bh to clean codes of check_extent_cache.

v2:
 o cleanup f2fs_map_bh pointed out by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:45 -08:00
Chao Yu 4d0b0bd438 f2fs: simplfy a field name in struct f2fs_extent,extent_info
Rename a filed name from 'blk_addr' to 'blk' in struct {f2fs_extent,extent_info}
as annotation of this field descripts its meaning well to us.

By this way, we can avoid long statement in code of following patches.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:45 -08:00
Chao Yu 0c872e2ded f2fs: move ext_lock out of struct extent_info
Move ext_lock out of struct extent_info, then in the following patches we can
use variables with struct extent_info type as a parameter to pass pure data.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:45 -08:00
Chao Yu 3c0d84d6f1 f2fs: fix incorrectly stat number of inline data inode
We should stat inline data information for temp file in f2fs_tmpfile if we
enable inline_data feature.

Otherwise, inline data stat number will be wrong after this temp file is
evicted.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:45 -08:00
Chao Yu 97dc3fd2cb f2fs: use ->writepage in sync_meta_pages
This patch uses ->writepage of meta mapping in sync_meta_pages instead of
f2fs_write_meta_page, by this way, in its caller we can ignore any changes
(e.g. changing name) of this registered function.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:44 -08:00
Chao Yu 3b4d732a56 f2fs: introduce f2fs_update_dentry to clean up duplicated codes
This patch introduces f2fs_update_dentry to remove redundant code in
f2fs_add_inline_entry and __f2fs_add_link.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:44 -08:00
Chao Yu 1753396a0a f2fs: remove unused inline_dentry_addr
inline_dentry_addr is introduced with inline dentry feature without being used,
now we do not need to keep it for any reason, so remove it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:44 -08:00
Linus Torvalds c7d7b98671 Merge tag 'for-f2fs-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "Major changes are to:
   - add f2fs_io_tracer and F2FS_IOC_GETVERSION
   - fix wrong acl assignment from parent
   - fix accessing wrong data blocks
   - fix wrong condition check for f2fs_sync_fs
   - align start block address for direct_io
   - add and refactor the readahead flows of FS metadata
   - refactor atomic and volatile write policies

  But most of patches are for clean-ups and minor bug fixes.  Some of
  them refactor old code too"

* tag 'for-f2fs-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (64 commits)
  f2fs: use spinlock for segmap_lock instead of rwlock
  f2fs: fix accessing wrong indexed data blocks
  f2fs: avoid variable length array
  f2fs: fix sparse warnings
  f2fs: allocate data blocks in advance for f2fs_direct_IO
  f2fs: introduce macros to convert bytes and blocks in f2fs
  f2fs: call set_buffer_new for get_block
  f2fs: check node page contents all the time
  f2fs: avoid data offset overflow when lseeking huge file
  f2fs: fix to use highmem for pages of newly created directory
  f2fs: introduce a batched trim
  f2fs: merge {invalidate,release}page for meta/node/data pages
  f2fs: show the number of writeback pages in stat
  f2fs: keep PagePrivate during releasepage
  f2fs: should fail mount when trying to recover data on read-only dev
  f2fs: split UMOUNT and FASTBOOT flags
  f2fs: avoid write_checkpoint if f2fs is mounted readonly
  f2fs: support norecovery mount option
  f2fs: fix not to drop mount options when retrying fill_super
  f2fs: merge flags in struct f2fs_sb_info
  ...
2015-02-12 19:28:50 -08:00
Chao Yu 1a118ccfd6 f2fs: use spinlock for segmap_lock instead of rwlock
rwlock can provide better concurrency when there are much more readers than
writers because readers can hold the rwlock simultaneously.

But now, for segmap_lock rwlock in struct free_segmap_info, there is only one
reader 'mount' from below call path:
->f2fs_fill_super
  ->build_segment_manager
    ->build_dirty_segmap
      ->init_dirty_segmap
        ->find_next_inuse
          read_lock
          ...
          read_unlock

Now that our concurrency can not be improved since there is no other reader for
this lock, we do not need to use rwlock_t type for segmap_lock, let's replace it
with spinlock_t type.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:51 -08:00
Jaegeuk Kim f1a3b98e73 f2fs: fix accessing wrong indexed data blocks
This patch fixes the following test.

This causes:
 attempt to access beyond end of device
 sdb2: rw=16384, want=14413962000, limit=16777216

The reason is:
 - f2fs_write_begin
  - f2fs_convert_inline_inode returns -ENOSPC
  - f2fs_write_failed
   - truncate_blocks
    - truncate_partial_data_page
     - find_data_page
      - get_dnode_of_data returns wrong data index retrieved from inline_data
      - f2fs_submit_page_bio(wrong data index)
       - submit_bio(wrong data index)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:51 -08:00
Jaegeuk Kim 60a3b782b1 f2fs: avoid variable length array
Instead of using variable length array, this patch let preallocate memory for
them.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:50 -08:00
Jaegeuk Kim 29e7043f40 f2fs: fix sparse warnings
This patch resolves the following warnings.

include/trace/events/f2fs.h:150:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:180:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:150:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
include/trace/events/f2fs.h:180:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)

fs/f2fs/checkpoint.c:27:19: warning: symbol 'inode_entry_slab' was not declared. Should it be static?
fs/f2fs/checkpoint.c:577:15: warning: cast to restricted __le32
fs/f2fs/checkpoint.c:592:15: warning: cast to restricted __le32

fs/f2fs/trace.c:19:1: warning: symbol 'pids' was not declared. Should it be static?
fs/f2fs/trace.c:21:21: warning: symbol 'last_io' was not declared. Should it be static?

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:49 -08:00
Jaegeuk Kim 59b802e5a4 f2fs: allocate data blocks in advance for f2fs_direct_IO
This patch adds preallocation for data blocks to prepare f2fs_direct_IO.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:49 -08:00
Jaegeuk Kim f7ef9b83b5 f2fs: introduce macros to convert bytes and blocks in f2fs
This patch adds two macros for transition between byte and block offsets.
Currently, f2fs only supports 4KB blocks, so use the default size for now.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:48 -08:00
Jaegeuk Kim da17eece03 f2fs: call set_buffer_new for get_block
This patch fixes wrong handling of buffer_new flag in get_block.
If f2fs allocates new blocks and mapped buffer_head, it needs to set buffer_new
for the bh_result.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:47 -08:00
Jaegeuk Kim aaf9607516 f2fs: check node page contents all the time
In get_node_page, if the page is up-to-date, we assumed that the page was not
reclaimed at all.
But, sometimes it was reported that its contents was missing.
So, just for sure, let's check its mapping and contents.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:46 -08:00
Chao Yu 2e023174a8 f2fs: avoid data offset overflow when lseeking huge file
xfstest generic/285 complains our issue in lseeking huge file.

Here is the detail output of generic/285:
"./check -f2fs tests/generic/285
Ran: generic/285
Failures: generic/285
Failed 1 of 1 tests

10. Test a huge file for offset overflow
10.01 SEEK_HOLE expected 65536 or 8589934592, got 65536.          succ
10.02 SEEK_HOLE expected 65536 or 8589934592, got 65536.          succ
10.03 SEEK_DATA expected 0 or 0, got 0.                           succ
10.04 SEEK_DATA expected 1 or 1, got 1.                           succ
10.05 SEEK_HOLE expected 8589934592 or 8589934592, got 0.         FAIL
10.06 SEEK_DATA expected 8589869056 or 8589869056, got 8589869056. succ
10.07 SEEK_DATA expected 8589869057 or 8589869057, got 8589869057. succ
10.08 SEEK_DATA expected 8589869056 or 8589869056, got 4294901760. FAIL"

The reason of this issue is:
We will calculate current offset through left shifting page-offset with
PAGE_CACHE_SHIFT bits, but our page-offset is a type of unsigned long, its size
is 4 bytes in 32-bits machine.

So if our page-offset is bigger than (1 << 32 / pagesize - 1), result of left
shifting will overflow.

Let's fix this issue by casting type of page-offset to type of current offset:
loff_t.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:46 -08:00
Chao Yu 560d4672e2 f2fs: fix to use highmem for pages of newly created directory
In commit a78186ebe5 ("f2fs: use highmem for directory pages"), we have set
__GFP_HIGHMEM into dir mapping's gfp flag in f2fs_iget, so high address memory
could be used for these existing dir's page.

But we forgot to set flag for newly created dir, due to this reason, our newly
created dir pages could not be allocated from high address memory. Fix it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:45 -08:00
Jaegeuk Kim bba681cbb2 f2fs: introduce a batched trim
This patch introduces a batched trimming feature, which submits split discard
commands.

This is to avoid long latency due to huge trim commands.
If fstrim was triggered ranging from 0 to the end of device, we should lock
all the checkpoint-related mutexes, resulting in very long latency.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:44 -08:00
Chao Yu 487261f39b f2fs: merge {invalidate,release}page for meta/node/data pages
This patch merges ->{invalidate,release}page function for meta/node/data pages.

After this, duplication of codes could be removed.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:44 -08:00
Jaegeuk Kim d24bdcbfc6 f2fs: show the number of writeback pages in stat
This patch adds the # of writeback pages in stat info.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:43 -08:00
Jaegeuk Kim f68daeebba f2fs: keep PagePrivate during releasepage
If PagePrivate is removed by releasepage, f2fs loses counting dirty pages.

e.g., try_to_release_page will not release page when the page is dirty,
but our releasepage removes PagePrivate.

    [<ffffffff81188d75>] try_to_release_page+0x35/0x50
    [<ffffffff811996f9>] invalidate_inode_pages2_range+0x2f9/0x3b0
    [<ffffffffa02a7f54>] ? truncate_blocks+0x384/0x4d0 [f2fs]
    [<ffffffffa02b7583>] ? f2fs_direct_IO+0x283/0x290 [f2fs]
    [<ffffffffa02b7fb0>] ? get_data_block_fiemap+0x20/0x20 [f2fs]
    [<ffffffff8118aa53>] generic_file_direct_write+0x163/0x170
    [<ffffffff8118ad06>] __generic_file_write_iter+0x2a6/0x350
    [<ffffffff8118adef>] generic_file_write_iter+0x3f/0xb0
    [<ffffffff81203081>] new_sync_write+0x81/0xb0
    [<ffffffff81203837>] vfs_write+0xb7/0x1f0
    [<ffffffff81204459>] SyS_write+0x49/0xb0
    [<ffffffff817c286d>] system_call_fastpath+0x16/0x1b

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:42 -08:00
Jaegeuk Kim 081d78c2fc f2fs: should fail mount when trying to recover data on read-only dev
If device is read-only, we should not proceed data recovery.
But, if the previous checkpoint was done by normal clean shutdown, it's safe to
proceed the recovery, since there will be no data to be recovered.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:42 -08:00
Jaegeuk Kim 119ee91445 f2fs: split UMOUNT and FASTBOOT flags
This patch adds FASTBOOT flag into checkpoint as follows.

 - CP_UMOUNT_FLAG is set when system is umounted.
 - CP_FASTBOOT_FLAG is set when intermediate checkpoint having node summaries
   was done.

So, if you get CP_UMOUNT_FLAG from checkpoint, the system was umounted cleanly.
Instead, if there was sudden-power-off, you can get CP_FASTBOOT_FLAG or nothing.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:41 -08:00
Jaegeuk Kim 11504a8e7e f2fs: avoid write_checkpoint if f2fs is mounted readonly
Do not change any partition when f2fs is changed to readonly mode.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:40 -08:00
Jaegeuk Kim 2d834bf9ac f2fs: support norecovery mount option
This patch adds a mount option, norecovery, which is mostly same as
disable_roll_forward. The only difference is that norecovery should be activated
with read-only mount option.

This can be used when user wants to check whether f2fs is mountable or not
without any recovery process. (e.g., xfstests/200)

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:40 -08:00
Jaegeuk Kim dabc4a5c60 f2fs: fix not to drop mount options when retrying fill_super
If wrong mount option was requested, f2fs tries to fill_super again.
But, during the next trial, f2fs has no valid mount options, since
parse_options deleted all the separators in the original string.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:39 -08:00
Chao Yu caf0047e7e f2fs: merge flags in struct f2fs_sb_info
Currently, there are several variables with Boolean type as below:

struct f2fs_sb_info {
...
	int s_dirty;
	bool need_fsck;
	bool s_closing;
...
	bool por_doing;
...
}

For this there are some issues:
1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean
   type variables by compiler.
2. if we continuously add new flag into f2fs_sb_info, structure will be messed
   up.

So in this patch, we try to:
1. switch s_dirty to Boolean type variable since it has two status 0/1.
2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag.
3. introduce an enum type which can indicate different states of sbi.
4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag
   to operate flags for sbi.

After that, above issues will be fixed.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:38 -08:00
Chao Yu 88dd893419 f2fs: clean up {in,de}create_sleep_time
Use pointer parameter @wait to pass result in {in,de}create_sleep_time for
cleanup.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:37 -08:00
Chao Yu feeb0debfb f2fs: make truncate_inline_date static
1. make truncate_inline_date static;
2. remove parameter @from of truncate_inline_date as callers only pass zero.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:37 -08:00
Kinglong Mee 3b6709b771 f2fs: fix a bug of inheriting default ACL from parent
Introduced by a6dda0e63e
"f2fs: use generic posix ACL infrastructure".

When testing default acl, gets in recent kernel (3.19.0-rc5),
user::rwx
group::r-x
other::r-x
default:user::rwx
default:group::r-x
default:group:root:rwx
default😷:rwx
default:other::r-x

]# getfacl testdir/
user::rwx
group::rwx
                // missing an acl "group:root:rwx" inherited from parent
other::r-x
default:user::rwx
default:group::r-x
default:group:root:rwx
default😷:rwx
default:other::r-x

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:36 -08:00
Chao Yu f28e503429 f2fs: use f2fs_radix_tree_insert to clean codes
No modification in functionality, just clean codes with f2fs_radix_tree_insert.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:35 -08:00
Chao Yu d49f3e8902 f2fs: add F2FS_IOC_GETVERSION support
In this patch we add the FS_IOC_GETVERSION ioctl for getting i_generation from
inode, after that, users can list file's generation number by using "lsattr -v".

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:35 -08:00
Jaegeuk Kim bc4a1f873b f2fs: leave comment for code readability
During the recovery, any xattr blocks should not be found, since they are
written into cold log, not the warm node chain.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:34 -08:00
Chao Yu 1601839e9e f2fs: fix to release count of meta page in ->invalidatepage
We will encounter deadloop in below scenario:

1. increase page count for F2FS_DIRTY_META type in following path:
->recover_fsync_data
  ->recover_data
    ->do_recover_data
      ->recover_data_page
        ->change_curseg
          ->write_sum_page
            ->set_page_dirty
2. fail in recover_data()
3. invalidate meta pages in truncate_inode_pages_final without decreasing page
   count.
4. deadloop when sync_meta_pages as page count will always be non-zero.

message:
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!

 [<c1129a37>] pagevec_lookup_tag+0x27/0x30
 [<f0e774c7>] sync_meta_pages+0x87/0x160 [f2fs]
 [<f0e86dd9>] recover_fsync_data+0xeb9/0xf10 [f2fs]
 [<f0e75398>] f2fs_fill_super+0x888/0x980 [f2fs]
 [<c11733ca>] mount_bdev+0x16a/0x1a0
 [<f0e7180f>] f2fs_mount+0x1f/0x30 [f2fs]
 [<c1173da6>] mount_fs+0x36/0x170
 [<c118b6f5>] vfs_kern_mount+0x55/0xe0
 [<c118d63f>] do_mount+0x1df/0x9f0
 [<c118e110>] SyS_mount+0x70/0xb0
 [<c15a0c48>] sysenter_do_call+0x12/0x12

To avoid page count leak, let's add ->invalidatepage and ->releasepage in
f2fs_meta_aops as f2fs_node_aops to release meta page count correctly.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:33 -08:00
Jaegeuk Kim 85dc2f2c6c f2fs: do checkpoint when umount flag is not set
If the previous checkpoint was done without CP_UMOUNT flag, it needs to do
checkpoint with CP_UMOUNT for the next fast boot.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:33 -08:00
Jaegeuk Kim 30a5537f9a f2fs: trigger correct checkpoint during umount
This patch fixes to trigger checkpoint with umount flag when kill_sb was called.
In kill_sb, f2fs_sync_fs was finally called, but at this time, f2fs can't do
checkpoint with CP_UMOUNT.
After then, f2fs_put_super is not doing checkpoint, since it is not dirty.

So, this patch adds a flag to indicate f2fs_sync_fs is called during umount.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:32 -08:00
Jaegeuk Kim 6f0aacbc3c f2fs: update memory footprint information
This patch adds missing memory usages, and splits them in detail.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:31 -08:00
Chao Yu 9066c6a7eb f2fs: fix wrong memory footprint statistics in debugfs
Our value of memory footprint statistics showed in debugfs is not calculated
correctly. Fix it in this patch.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:31 -08:00
Jaegeuk Kim 871f599f4a f2fs: avoid infinite loop on cp_error
If cp_error is set, we should avoid all the infinite loop.
In f2fs_sync_file, there is a hole, and this patch fixes that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:30 -08:00
Kirill A. Shutemov d83a08db5b mm: drop vm_ops->remap_pages and generic_file_remap_pages() stub
Nobody uses it anymore.

[akpm@linux-foundation.org: fix filemap_xip.c]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10 14:30:30 -08:00
kbuild test robot 08e4126e1e f2fs: pids_lock can be static
fs/f2fs/trace.c:19:12: sparse: symbol 'pids_lock' was not declared. Should it be static?

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:29 -08:00
Jaegeuk Kim 351f4fba84 f2fs: add f2fs_destroy_trace_ios to free radix tree
This patch removes radix tree after finishing tracing IOs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:28 -08:00
Jaegeuk Kim c05086506f f2fs: add spin_lock to cover radix operations in IO tracer
This patch adds spin_lock to cover radix tree operations in IO tracer.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:28 -08:00
Jaegeuk Kim dd4e4b59b1 f2fs: add nat/sit entries into status
This patch adds NAT/SIT entry informations.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:28 -08:00
Jaegeuk Kim 7aed0d4537 f2fs: free radix_tree_nodes used by nat_set entries
In the normal case, the radix_tree_nodes are freed successfully.
But, when cp_error was detected, we should destroy them forcefully.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:28 -08:00
Jaegeuk Kim df1991391f f2fs: fix wrong unlock_page call
This patch removes wrongly called unlock_page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:27 -08:00
Chao Yu 9e5ba77fdb f2fs: get rid of kzalloc in __recover_inline_status
We use kzalloc to allocate memory in __recover_inline_status, and use this
all-zero memory to check the inline date content of inode page by comparing
them. This is low effective and not needed, let's check inline date content
directly.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: make the code more neat]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:27 -08:00
Jaegeuk Kim 38aa0889b2 f2fs: align direct_io'ed data to section
This patch aligns the start block address of a file for direct io to the f2fs's
section size.

Some flash devices manage an over 4KB-sized page as a write unit, and if the
direct_io'ed data are written but not aligned to that unit, the performance can
be degraded due to the partial page copies.

Thus, since f2fs has a section that is well aligned to FTL units, we can align
the block address to the section size so that f2fs avoids this misalignment.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:27 -08:00
Jaegeuk Kim 41ef94b35c f2fs: remove uncovered code path
This patch removes unnecessary function calls.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:27 -08:00
Jaegeuk Kim 3547ea961d f2fs: avoid potential unnecessary codes
This patch relocates some operations to avoid unnecessary execution.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:26 -08:00
Jaegeuk Kim e1509cf294 f2fs: clean up to remove parameter
This patch uses dn->data_blkaddr as a parameter for the destination block
address.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:26 -08:00
Chao Yu 062920734c f2fs: reuse inode_entry_slab in gc procedure for using slab more effectively
There are two slab cache inode_entry_slab and winode_slab using the same
structure as below:

struct dir_inode_entry {
	struct list_head list;	/* list head */
	struct inode *inode;	/* vfs inode pointer */
};

struct inode_entry {
	struct list_head list;
	struct inode *inode;
};

It's a little waste that the two cache can not share their memory space for each
other.
So in this patch we remove one redundant winode_slab slab cache, then use more
universal name struct inode_entry as remaining data structure name of slab,
finally we reuse the inode_entry_slab to store dirty dir item and gc item for
more effective.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:26 -08:00
Chao Yu 2ace38e00e f2fs: cleanup parameters for trace_f2fs_submit_{read_,write_,page_,page_m}bio with fio
Cleanup parameters for trace_f2fs_submit_{read_,write_,page_,page_m}bio with fio
as one parameter.

Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:26 -08:00
Chao Yu 3e1c8f125e f2fs: cleanup trace event of f2fs_submit_page_{m,}bio with DECLARE_EVENT_CLASS
This patch adds missing parameter _type_ for trace_f2fs_submit_page_bio, then
use DECLARE_EVENT_CLASS/DEFINE_EVENT_CONDITION pair to cleanup some trace event
code related to f2fs_submit_page_{m,}bio.

Additionally, after we remove redundant code, size of code can be reduced:
   text    data     bss     dec     hex filename
 176787    8712      56  185555   2d4d3 f2fs.ko.org
 174408    8648      56  183112   2cb48 f2fs.ko

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:25 -08:00
Jaegeuk Kim 09eb483e89 f2fs: fix missing cold bit during recovery
In do_recover_data, we find and update previous node pages after updating
its new block addresses.
After then, we call fill_node_footer without reset field, we erase its
cold bit so that this new cold node block is written to wrong log area.
This patch fixes not to miss its old flag.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:25 -08:00
Changman Lee b9a2c25207 f2fs: add block count by in-place-update in stat info
This patch adds block count by in-place-update in stat.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:25 -08:00
Jaegeuk Kim dd802406e3 f2fs: avoid double lock for cp_rwsem
The __f2fs_add_link is covered by cp_rwsem all the time.
This calls init_inode_metadata, which conducts some acl operations including
memory allocation with GFP_KERNEL previously.
But, under memory pressure, f2fs_write_data_page can be called, which also
grabs cp_rwsem too.

In this case, this incurs a deadlock pointed by Chao.
Thread #1        Thread #2
 down_read
                 down_write
  down_read
 -> here down_read should wait forever.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:25 -08:00
Jaegeuk Kim db9f7c1a95 f2fs: activate f2fs_trace_ios
This patch activates f2fs_trace_ios.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:24 -08:00
Jaegeuk Kim 9e4ded3f30 f2fs: activate f2fs_trace_pid
This patch activates f2fs_trace_pid.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:24 -08:00
Jaegeuk Kim 0e689d0369 f2fs: add key functions for f2fs_io_tracer
This patch adds two key functions to trace process ids and IOs.
The basic idea is to
1. remain process ids, pids, in page->private.
2. show pids in IO traces.

So, later we can retrieve process information according to IO traces.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:24 -08:00
Jaegeuk Kim 63f92ddc8a f2fs: add f2fs_io_tracer support
This patch adds:
 o initial trace.c and trace.h with skeleton functions
 o Kconfig and Makefile to activate this feature

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:24 -08:00
Jaegeuk Kim cf04e8eb55 f2fs: use f2fs_io_info to clean up messy parameters during IO path
This patch cleans up parameters on IO paths.
The key idea is to use f2fs_io_info adding a parameter, block address, and then
use this structure as parameters.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:23 -08:00
Chao Yu 9ecf4b80bd f2fs: use ra_meta_pages to simplify readahead code in restore_node_summary
Use more common function ra_meta_pages() with META_POR to readahead node blocks
in restore_node_summary() instead of ra_sum_pages(), hence we can simplify the
readahead code there, and also we can remove unused function ra_sum_pages().

changes from v2:
 o use invalidate_mapping_pages as before suggested by Changman Lee.
changes from v1:
 o fix one bug when using truncate_inode_pages_range which is pointed out by
   Jaegeuk Kim.

Reviewed-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:23 -08:00
Chao Yu 5c27f4ee44 f2fs: merge two uchar variable in struct node_info to reduce memory cost
This patch moves one member of struct nat_entry: _flag_ to struct node_info,
so _version_ in struct node_info and _flag_ which are unsigned char type will
merge to one 32-bit space in register/memory. So the size of nat_entry will be
reduced from 28 bytes to 24 bytes (for 64-bit machine, reduce its size from 40
bytes to 32 bytes) and then slab memory using by f2fs will be reduced.

changes from v2:
 o update description of memory usage gain for 64-bit machine suggested by
   Changman Lee.
changes from v1:
 o introduce inline copy_node_info() to copy valid data from node info suggested
   by Jaegeuk Kim, it can avoid bug.

Reviewed-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:23 -08:00
Chao Yu 3fa06d7bc9 f2fs: readahead contiguous current summary blocks in checkpoint
Let's add readahead code for reading contiguous compact/normal summary blocks
in checkpoint, then we will gain better performance in mount procedure.

Changes from v1
  o remove inappropriate 'unlikely' in npages_for_summary_flush.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:23 -08:00
Jaegeuk Kim 5df1f1da7a f2fs: use missing the use of f2fs_kunmap_page
This patch calls f2fs_kunmap_page which I missed before.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:22 -08:00
Jaegeuk Kim 042b7816aa f2fs: remove unnecessary call to invalidate inmemory pages
Now we use inmemory pages for atomic write only and provide abort procedure,
we don't need to truncate them explicitly.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:22 -08:00
Jaegeuk Kim d7bc2484b8 f2fs: fix small discards not to issue redundantly
The ckpt_valid_map and cur_valid_map are synced by seg_info_to_raw_sit.

In the case of small discards, the candidates are selected before sync,
while fitrim selects candidates after sync.

So, for small discards, we need to add candidates only just being obsoleted.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:22 -08:00
Jaegeuk Kim 1e84371ffe f2fs: change atomic and volatile write policies
This patch adds two new ioctls to release inmemory pages grabbed by atomic
writes.
 o f2fs_ioc_abort_volatile_write
  - If transaction was failed, all the grabbed pages and data should be written.
 o f2fs_ioc_release_volatile_write
  - This is to enhance the performance of PERSIST mode in sqlite.

In order to avoid huge memory consumption which causes OOM, this patch changes
volatile writes to use normal dirty pages, instead blocked flushing to the disk
as long as system does not suffer from memory pressure.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:22 -08:00
Jaegeuk Kim 70c640b1d6 f2fs: don't need to call lock_op and lock_page for abort
We don't need to call lock_op and lock_page at the aborting path.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:21 -08:00
Jaegeuk Kim 88a70a69c0 f2fs: fix wrong condition check to trigger f2fs_sync_fs
If there is not enough available memory, we need to trigger f2fs_sync_fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:21 -08:00
Jaegeuk Kim cd52b6368f f2fs: remove checking dirty_exceed
We don't need to force to write dirty_exceeded for f2fs_balance_fs_bg.
This flag was only meaningful to write bypassing conditions.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:21 -08:00
Chao Yu 635aee1fef f2fs: avoid to ra unneeded blocks in recover flow
To improve recovery speed, f2fs try to readahead many contiguous blocks in warm
node segment, but for most time, abnormal power-off do not occur frequently, so
when mount a normal power-off f2fs image, by contrary ra so many blocks and then
invalid them will hurt the performance of mount.
It's better to just ra the first next-block for normal condition.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 14:19:09 -08:00
Chao Yu 66b00c1867 f2fs: introduce is_valid_blkaddr to cleanup codes in ra_meta_pages
This patch does cleanup work, it introduces is_valid_blkaddr() to include
verification code for blkaddr with upper and down boundary value which were in
ra_meta_pages previous.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 14:19:08 -08:00
Chao Yu 13da549460 f2fs: fix to enable readahead for SSA/CP blocks
1.We use zero as upper boundary value for ra SSA/CP blocks, we will skip
readahead as verification failure with max number, it causes low performance.
2.Low boundary value is not accurate for SSA/CP/POR region verification, so
these values need to be redefined.

This patch fixes above issues.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 14:19:07 -08:00
Chao Yu 03e14d522e f2fs: use atomic for counting inode with inline_{dir,inode} flag
As inline_{dir,inode} stat is increased/decreased concurrently by multi threads,
so the value is not so accurate, let's use atomic type for counting accurately.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:54:59 -08:00
Changman Lee 51455b1938 f2fs: cleanup path to need cp at fsync
Added some commentaries for code readability and cleaned up if-statement
clearly.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:40:22 -08:00
Changman Lee 9c7bb70212 f2fs: check if inode state is dirty at fsync
If inode state is dirty, go straight to write.

Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:37:13 -08:00
Jaegeuk Kim 8dcf2ff721 f2fs: count the number of inmemory pages
This patch adds counting # of inmemory pages in the page cache.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:35:15 -08:00
Jaegeuk Kim 126622343a f2fs: release inmemory pages when the file was closed
If file is closed, let's drop inmemory pages.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:35:15 -08:00
Jaegeuk Kim 0722b1011a f2fs: set page private for inmemory pages for truncation
The inmemory pages should be handled by invalidate_page since it needs to be
released int the truncation path.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:35:14 -08:00
Jaegeuk Kim 9d1015dd4c f2fs: count inline_xx in do_read_inode
In do_read_inode, if we failed __recover_inline_status, the inode has inline
flag without increasing its count.
Later, f2fs_evict_inode will decrease the count, which causes -1.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:35:13 -08:00
Jaegeuk Kim 9be32d72be f2fs: do retry operations with cond_resched
This patch revists retrial paths in f2fs.
The basic idea is to use cond_resched instead of retrying from the very early
stage.

Suggested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:35:05 -08:00
Jaegeuk Kim 769ec6e5b7 f2fs: call radix_tree_preload before radix_tree_insert
This patch tries to fix:

 BUG: using smp_processor_id() in preemptible [00000000] code: f2fs_gc-254:0/384
  (radix_tree_node_alloc+0x14/0x74) from [<c033d8a0>] (radix_tree_insert+0x110/0x200)
  (radix_tree_insert+0x110/0x200) from [<c02e8264>] (gc_data_segment+0x340/0x52c)
  (gc_data_segment+0x340/0x52c) from [<c02e8658>] (f2fs_gc+0x208/0x400)
  (f2fs_gc+0x208/0x400) from [<c02e8a98>] (gc_thread_func+0x248/0x28c)
  (gc_thread_func+0x248/0x28c) from [<c0139944>] (kthread+0xa0/0xac)
  (kthread+0xa0/0xac) from [<c0105ef8>] (ret_from_fork+0x14/0x3c)

The reason is that f2fs calls radix_tree_insert under enabled preemption.
So, before calling it, we need to call radix_tree_preload.

Otherwise, we should use _GFP_WAIT for the radix tree, and use mutex or
semaphore to cover the radix tree operations.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-05 09:51:04 -08:00
Jaegeuk Kim 8b26ef98da f2fs: use rw_semaphore for nat entry lock
Previoulsy, we used rwlock for nat_entry lock.
But, now we have a lot of complex operations in set_node_addr.
(e.g., allocating kernel memories, handling radix_trees, and so on)

So, this patches tries to change spinlock to rw_semaphore to give CPUs to other
threads.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-03 21:23:29 -08:00
Jaegeuk Kim 4634d71ed1 f2fs: fix missing kmem_cache_free
This patch fixes missing kmem_cache_free when handling errors.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-03 16:40:28 -08:00
Changman Lee 7dda2af83b f2fs: more fast lookup for gc_inode list
If there are many inodes that have data blocks in victim segment,
it takes long time to find a inode in gc_inode list.
Let's use radix_tree to reduce lookup time.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-02 11:02:50 -08:00
Changman Lee 9c01503f4d f2fs: cleanup redundant macro
We've already made fi and sbi for inode. Let's avoid duplicated work.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-01 14:16:50 -08:00
Chao Yu cd34e2969b f2fs: fix to return correct error number in f2fs_write_begin
Fix the wrong error number in error path of f2fs_write_begin.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-01 13:56:02 -08:00
Changman Lee 31a3268839 f2fs: cleanup if-statement of phase in gc_data_segment
Little cleanup to distinguish each phase easily

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: modify indentation for code readability]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-27 20:30:17 -08:00
Jaegeuk Kim 95f5b0fc5e f2fs: fix to recover converted inline_data
If an inode has converted inline_data which was written to the disk, we should
set its inode flag for further fsync so that this inline_data can be recovered
from sudden power off.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-25 18:08:00 -08:00
Jaegeuk Kim 158c194c37 f2fs: make clean the page before writing
If a page is set to be written to the disk, we can make clean the page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-25 17:33:31 -08:00
Changman Lee 80ec2e914d f2fs: no more dirty_nat_entires when flushing
After flushing dirty nat entries, it has to be no more dirty nat
entries.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-25 17:26:36 -08:00
Changman Lee 20d047c876 f2fs: check dirty_nat_cnt before flushing nat entries in journal
It's meaningless to check dirty_nat_cnt after re-dirtying nat entries in
journal. And although there are rooms for dirty nat entires if dirty_nat_cnt
is zero, it's also meaningless to check __has_cursum_space.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-25 17:26:34 -08:00
Jaegeuk Kim 5f72739583 f2fs: fix deadlock during inline_data conversion
A deadlock can be occurred:
Thread 1]                             Thread 2]
 - f2fs_write_data_pages              - f2fs_write_begin
   - lock_page(page #0)
                                        - grab_cache_page(page #X)
                                        - get_node_page(inode_page)
                                        - grab_cache_page(page #0)
                                          : to convert inline_data
   - f2fs_write_data_page
     - f2fs_write_inline_data
       - get_node_page(inode_page)

In this case, trying to lock inode_page and page #0 causes deadlock.
In order to avoid this, this patch adds a rule for this locking policy,
which is that page #0 should be locked followed by inode_page lock.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-25 12:08:30 -08:00
Markus Elfring ce3e6d25f3 f2fs: fix typos for the word "destroy" in jump labels
Two jump labels were adjusted in the implementation of the
create_node_manager_caches() function because these identifiers
contained typos.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-25 12:08:22 -08:00
Jaegeuk Kim 0341845efc f2fs: fix livelock calling f2fs_iget during f2fs_evict_inode
In f2fs_evict_inode,
 commit_inmemory_pages
   f2fs_gc
     f2fs_iget
       iget_locked
         -> wait for inode free

Here, if the inode is same as the one to be evicted, f2fs should wait forever.
Actually, we should not call f2fs_balance_fs during f2fs_evict_inode to avoid
this.

But, the commit_inmem_pages calls f2fs_balance_fs by default, even if
f2fs_evict_inode wants to free inmemory pages only.

Hence, this patch adds to trigger f2fs_balance_fs only when there is something
to write.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-23 21:51:57 -08:00
Jaegeuk Kim 9486ba442b f2fs: introduce f2fs_dentry_kunmap to clean up
This patch introduces f2fs_dentry_kunmap to clean up dirty codes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-23 21:51:53 -08:00
Changman Lee c9ee00857c f2fs: fix wrong data structure when create slab
It used nat_entry_set when create slab for sit_entry_set.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-23 21:48:49 -08:00
Jaegeuk Kim 09b8b3c839 f2fs: call flush_dcache_page when the page was updated
Whenever f2fs updates mapped pages, it needs to call flush_dcache_page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-23 21:48:31 -08:00
Jaegeuk Kim 857dc4e059 f2fs: write SSA pages under memory pressure
Under memory pressure, we don't need to skip SSA page writes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-19 22:49:33 -08:00
Jaegeuk Kim 27c6bd60ac f2fs: submit bio for node blocks in the reclaim path
If a node page is request to be written during the reclaiming path, we should
submit the bio to avoid pending to recliam it.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-19 22:49:32 -08:00
Chao Yu 67298804f3 f2fs: introduce struct inode_management to wrap inner fields
Now in f2fs, we have three inode cache: ORPHAN_INO, APPEND_INO, UPDATE_INO,
and we manage fields related to inode cache separately in struct f2fs_sb_info
for each inode cache type.
This makes codes a bit messy, so that this patch intorduce a new struct
inode_management to wrap inner fields as following which make codes more neat.

/* for inner inode cache management */
struct inode_management {
	struct radix_tree_root ino_root;	/* ino entry array */
	spinlock_t ino_lock;			/* for ino entry lock */
	struct list_head ino_list;		/* inode list head */
	unsigned long ino_num;			/* number of entries */
};

struct f2fs_sb_info {
	...
	struct inode_management im[MAX_INO_ENTRY];      /* manage inode cache */
	...
}

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-19 22:49:32 -08:00
Chao Yu aba291b3d8 f2fs: remove unneeded check code with option in f2fs_remount
Because we have checked the contrary condition in case of "if" judgment, we do
not need to check the condition again in case of "else" judgment. Let's remove
it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-19 22:49:31 -08:00
Chao Yu 6c02993203 f2fs: avoid unable to restart gc thread in remount
In f2fs_remount, we will stop gc thread and set need_restart_gc as true when new
option is set without BG_GC, then if any error occurred in the following
procedure, we can restore to start the gc thread.
But after that, We will fail to restore gc thread in start_gc_thread as BG_GC is
not set in new option, so we'd better move this condition judgment out of
start_gc_thread to fix this issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-19 22:49:30 -08:00
Jaegeuk Kim 8cdcb71322 f2fs: put the inode page when error was occurred
We should put the inode page when error was occurred.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-18 17:04:33 -08:00
Jaegeuk Kim 6d20aff83c f2fs: fix to call put_page at the error handling routine
The locked page should be released before returning the function.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-18 17:02:47 -08:00
Jaegeuk Kim 92dffd0179 f2fs: convert inline_data when i_size becomes large
If i_size becomes large outside of MAX_INLINE_DATA, we shoud convert the inode.
Otherwise, we can make some dirty pages during the truncation, and those pages
will be written through f2fs_write_data_page.
At that moment, the inode has still inline_data, so that it tries to write non-
zero pages into inline_data area.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-11 14:16:12 -08:00
Jaegeuk Kim 764d2c8040 f2fs: fix deadlock to grab 0'th data page
The scenario is like this.

One trhead triggers:
  f2fs_write_data_pages
    lock_page
    f2fs_write_data_page
      f2fs_lock_op  <- wait

The other thread triggers:
  f2fs_truncate
    truncate_blocks
      f2fs_lock_op
        truncate_partial_data_page
          lock_page  <- wait for locking the page

This patch resolves this bug by relocating truncate_partial_data_page.
This function is just to truncate user data page and not related to FS
consistency as well.
And, we don't need to call truncate_inline_data. Rather than that,
f2fs_write_data_page will finally update inline_data later.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-11 14:15:48 -08:00
Jaegeuk Kim 57e2a2c0a6 f2fs: reduce the number of inline_data inode before clearing it
The # of inline_data inode is decreased only when it has inline_data.
After clearing the flag, we can't decreased the number.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-10 16:29:14 -08:00
Jaegeuk Kim b7e1d80003 f2fs: implement -o dirsync
If a mount option has dirsync, we should call checkpoint for all the directory
operations.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-10 06:51:39 -08:00
Jaegeuk Kim 510184c89f f2fs: do not skip any writes under memory pressure
Under memory pressure, let's avoid skipping data writes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-10 06:51:38 -08:00
Jaegeuk Kim 2f97c326bf f2fs: write node pages if checkpoint is not doing
It needs to write node pages if checkpoint is not doing in order to avoid
memory pressure.

Reviewed-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-10 06:51:28 -08:00
Jaegeuk Kim e5e7ea3c86 f2fs: control the memory footprint used by ino entries
This patch adds to control the memory footprint used by ino entries.
This will conduct best effort, not strictly.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-06 15:24:46 -08:00
Jaegeuk Kim 8c402946f0 f2fs: introduce the number of inode entries
This patch adds to monitor the number of ino entries.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-06 15:17:43 -08:00
Jaegeuk Kim a344b9fda0 f2fs: disable roll-forward when active_logs = 2
The roll-forward mechanism should be activated when the number of active
logs is not 2.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-05 20:05:53 -08:00
Jaegeuk Kim d5053a34a9 f2fs: introduce -o fastboot for reducing booting time only
If a system wants to reduce the booting time as a top priority, now we can
use a mount option, -o fastboot.
With this option, f2fs conducts a little bit slow write_checkpoint, but
it can avoid the node page reads during the next mount time.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-04 17:34:15 -08:00
Jaegeuk Kim 6a8f8ca582 f2fs: avoid race condition in handling wait_io
__submit_merged_bio    f2fs_write_end_io        f2fs_write_end_io
                       wait_io = X              wait_io = x
                       complete(X)              complete(X)
                       wait_io = NULL
wait_for_completion()
free(X)
                                                 spin_lock(X)
                                                 kernel panic

In order to avoid this, this patch removes the wait_io facility.
Instead, we can use wait_on_all_pages_writeback(sbi) to wait for end_ios.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-04 17:34:14 -08:00
Jaegeuk Kim adf4983bde f2fs: send discard commands in larger extent
If there is a chance to make a huge sized discard command, we don't need
to split it out, since each blkdev_issue_discard should wait one at a
time.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-04 17:34:13 -08:00
Jaegeuk Kim b3d208f96d f2fs: revisit inline_data to avoid data races and potential bugs
This patch simplifies the inline_data usage with the following rule.
1. inline_data is set during the file creation.
2. If new data is requested to be written ranges out of inline_data,
 f2fs converts that inode permanently.
3. There is no cases which converts non-inline_data inode to inline_data.
4. The inline_data flag should be changed under inode page lock.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-04 17:34:11 -08:00
Jan Kara 1f7732fe6c f2fs: remove pointless bit testing in f2fs_delete_entry()
There's no point in using test_and_clear_bit_le() when we don't use the
return value of the function. Just use clear_bit_le() instead.

Coverity-id: 1016434
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:38 -08:00
Jaegeuk Kim e3fb1b794b f2fs: do not discard data protected by the previous checkpoint
We should not discard any data protected by the previous checkpoint all
the time.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:38 -08:00
Jaegeuk Kim 427a45c8e2 f2fs: flush_dcache_page for inline data
When reading inline data, we should call flush_dcache_page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:37 -08:00
Jaegeuk Kim ca4b02eeed f2fs: call write_checkpoint under disabled gc
During the write_checkpoint, we should avoid f2fs_gc trigger to avoid any
filesystem consistency.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:37 -08:00
Jan Kara 9234f3190b f2fs: fix possible data corruption in f2fs_write_begin()
f2fs_write_begin() doesn't initialize the 'dn' variable if the inode has
inline data. However it uses its contents to decide whether it should
just zero out the page or load data to it. Thus if we are unlucky we can
zero out page contents instead of loading inline data into a page.

CC: stable@vger.kernel.org
CC: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:37 -08:00
Gu Zheng 2cc2218611 f2fs: use current_sit_addr to replace the open code
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:37 -08:00
Gu Zheng 52aca07425 f2fs: rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bit
Rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bit, which mean
set/clear bit and return the old value, for better readability.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:36 -08:00
Gu Zheng 1730663cb7 f2fs: set raw_super default to NULL to avoid compile warning
Set raw_super default to NULL to avoid the possibly used
uninitialized warning, though we may never hit it in fact.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:36 -08:00
Gu Zheng c6ac4c0ec4 f2fs: introduce f2fs_change_bit to simplify the change bit logic
Introduce f2fs_change_bit to simplify the change bit logic in
function set_to_next_nat{sit}.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:36 -08:00
Gu Zheng fa528722d0 f2fs: remove the redundant function cond_clear_inode_flag
Use clear_inode_flag to replace the redundant cond_clear_inode_flag.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:36 -08:00
Gu Zheng 8a2d0ace3a f2fs: remove the seems unneeded argument 'type' from __get_victim
Remove the unneeded argument 'type' from __get_victim, use
NO_CHECK_TYPE directly when calling v_ops->get_victim().

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:35 -08:00
Jan Kara 9bd27ae4aa f2fs: avoid returning uninitialized value to userspace from f2fs_trim_fs()
If user specifies too low end sector for trimming, f2fs_trim_fs() will
use uninitialized value as a number of trimmed blocks and returns it to
userspace. Initialize number of trimmed blocks early to avoid the
problem.

Coverity-id: 1248809
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:35 -08:00
Jaegeuk Kim d64948a4df f2fs: declare f2fs_convert_inline_dir as a static function
This patch declares f2fs_convert_inline_dir as a static function, which was
reported by kbuild test robot.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:35 -08:00
Jaegeuk Kim f1e33a041e f2fs: use kmap_atomic instead of kmap
For better performance, we need to use kmap_atomic instead of kmap.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:35 -08:00
Jaegeuk Kim 062a3e7ba7 f2fs: reuse make_empty_dir code for inline_dentry
This patch introduces do_make_empty_dir to mitigate code redundancy
for inline_dentry.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:34 -08:00
Jaegeuk Kim 7b3cd7d6f0 f2fs: introduce f2fs_dentry_ptr structure for code clean-up
This patch introduces f2fs_dentry_ptr structure for the use of a function
parameter in inline_dentry operations.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:34 -08:00
Jaegeuk Kim 5ab18570b8 f2fs: should not truncate any inline_dentry
If the inode has inline_dentry, we should not truncate any block indices.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:34 -08:00
Jaegeuk Kim 38594de767 f2fs: reuse core function in f2fs_readdir for inline_dentry
This patch introduces a core function, f2fs_fill_dentries, to remove
redundant code in f2fs_readdir and f2fs_read_inline_dir.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:34 -08:00
Jaegeuk Kim e7a2bf2283 f2fs: fix counting inline_data inode numbers
This patch fixes wrongly counting inline_data inode numbers.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:33 -08:00
Jaegeuk Kim 3289c061c5 f2fs: add stat info for inline_dentry inodes
This patch adds status information for inline_dentry inodes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:33 -08:00
Jaegeuk Kim bce8d11207 f2fs: avoid deadlock on init_inode_metadata
Previously, init_inode_metadata does not hold any parent directory's inode
page. So, f2fs_init_acl can grab its parent inode page without any problem.
But, when we use inline_dentry, that page is grabbed during f2fs_add_link,
so that we can fall into deadlock condition like below.

INFO: task mknod:11006 blocked for more than 120 seconds.
      Tainted: G           OE  3.17.0-rc1+ #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mknod           D ffff88003fc94580     0 11006  11004 0x00000000
 ffff880007717b10 0000000000000002 ffff88003c323220 ffff880007717fd8
 0000000000014580 0000000000014580 ffff88003daecb30 ffff88003c323220
 ffff88003fc94e80 ffff88003ffbb4e8 ffff880007717ba0 0000000000000002
Call Trace:
 [<ffffffff8173dc40>] ? bit_wait+0x50/0x50
 [<ffffffff8173d4cd>] io_schedule+0x9d/0x130
 [<ffffffff8173dc6c>] bit_wait_io+0x2c/0x50
 [<ffffffff8173da3b>] __wait_on_bit_lock+0x4b/0xb0
 [<ffffffff811640a7>] __lock_page+0x67/0x70
 [<ffffffff810acf50>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff811652cc>] pagecache_get_page+0x14c/0x1e0
 [<ffffffffa029afa9>] get_node_page+0x59/0x130 [f2fs]
 [<ffffffffa02a63ad>] read_all_xattrs+0x24d/0x430 [f2fs]
 [<ffffffffa02a6ca2>] f2fs_getxattr+0x52/0xe0 [f2fs]
 [<ffffffffa02a7481>] f2fs_get_acl+0x41/0x2d0 [f2fs]
 [<ffffffff8122d847>] get_acl+0x47/0x70
 [<ffffffff8122db5a>] posix_acl_create+0x5a/0x150
 [<ffffffffa02a7759>] f2fs_init_acl+0x29/0xcb [f2fs]
 [<ffffffffa0286a8d>] init_inode_metadata+0x5d/0x340 [f2fs]
 [<ffffffffa029253a>] f2fs_add_inline_entry+0x12a/0x2e0 [f2fs]
 [<ffffffffa0286ea5>] __f2fs_add_link+0x45/0x4a0 [f2fs]
 [<ffffffffa028b5b6>] ? f2fs_new_inode+0x146/0x220 [f2fs]
 [<ffffffffa028b816>] f2fs_mknod+0x86/0xf0 [f2fs]
 [<ffffffff811e3ec1>] vfs_mknod+0xe1/0x160
 [<ffffffff811e4b26>] SyS_mknod+0x1f6/0x200
 [<ffffffff81741d7f>] tracesys+0xe1/0xe6

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:33 -08:00
Jaegeuk Kim 59a0615540 f2fs: fix to wait correct block type
The inode page needs to wait NODE block io.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:33 -08:00
Jaegeuk Kim 4e6ebf6d49 f2fs: reuse find_in_block code for find_in_inline_dir
This patch removes redundant copied code in find_in_inline_dir.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:32 -08:00
Jaegeuk Kim a82afa2019 f2fs: reuse room_for_filename for inline dentry operation
This patch introduces to reuse the existing room_for_filename for inline dentry
operation.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:32 -08:00
Chao Yu 622f28ae9b f2fs: enable inline dir handling
Add inline dir functions into normal dir ops' function to handle inline ops.
Besides, we enable inline dir mode when a new dir inode is created if
inline_data option is on.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:32 -08:00
Chao Yu 201a05be96 f2fs: add key function to handle inline dir
Adds Functions to implement inline dir init/lookup/insert/delete/convert ops.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: remove needless reserved area copy, pointed by Dan Carpenter]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:31 -08:00
Chao Yu dbeacf02eb f2fs: export dir operations for inline dir
This patch exports some dir operations for inline dir, additionally introduces
f2fs_drop_nlink from f2fs_delete_entry for reusing by inline dir function.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:31 -08:00
Chao Yu 5efd3c6f1b f2fs: add a new mount option for inline dir
Adds a new mount option 'inline_dentry' for inline dir.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:31 -08:00
Chao Yu 34d67debe0 f2fs: add infra struct and helper for inline dir
This patch defines macro/inline dentry structure, and adds some helpers for
inline dir infrastructure.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:31 -08:00
Jaegeuk Kim af41d3ee00 f2fs: avoid infinite loop at cp_error
This patch avoids an infinite loop in sync_dirty_inode_page when -EIO was
detected.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:31 -08:00
Jaegeuk Kim 4a257ed677 f2fs: avoid build warning
This patch removes build warning.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:30 -08:00
Jaegeuk Kim 13fd8f89f6 f2fs: fix to call f2fs_unlock_op
This patch fixes to call f2fs_unlock_op, which was missing before.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:30 -08:00
Jaegeuk Kim 9ba69cf987 f2fs: avoid to allocate when inline_data was written
The sceanrio is like this.
inline_data   i_size     page                 write_begin/vm_page_mkwrite
  X             30       dirty_page
  X             30                            write to #4096 position
  X             30       get_dnode_of_data    wait for get_dnode_of_data
  O             30       write inline_data
  O             30                            get_dnode_of_data
  O             30                            reserve data block
..

In this case, we have #0 = NEW_ADDR and inline_data as well.
We should not allow this condition for further access.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:30 -08:00
Jaegeuk Kim a78186ebe5 f2fs: use highmem for directory pages
This patch fixes to use highmem for directory pages.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:30 -08:00
Jaegeuk Kim 1ce86bf6f8 f2fs: fix race conditon on truncation with inline_data
Let's consider the following scenario.

blkaddr[0] inline_data i_size  i_blocks writepage           truncate
  NEW        X        4096        2    dirty page #0
  NEW        X         0                                    change i_size
  NEW        X         0          2    f2fs_write_inline_data
  NEW        X         0          2    get_dnode_of_data
  NEW        X         0          2    truncate_data_blocks_range
  NULL       O         0          1    memcpy(inline_data)
  NULL       O         0          1    f2fs_put_dnode
  NULL       O         0          1                         f2fs_truncate
  NULL       O         0          1                         get_dnode_of_data
  NULL       O         0          1                       *invalid block addr*

This patch adds checking inline_data flag during f2fs_truncate not to refer
corrupted block indices.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:29 -08:00
Jaegeuk Kim c08a690b46 f2fs: should truncate any allocated block for inline_data write
When trying to write inline_data, we should truncate any data block allocated
and pointed by the inode block.
We should consider the data index is not 0.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:29 -08:00
Jaegeuk Kim cbcb2872e3 f2fs: invalidate inmemory page
If user truncates file's data, we should truncate inmemory pages too.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:29 -08:00
Jaegeuk Kim 34ba94bac9 f2fs: do not make dirty any inmemory pages
This patch let inmemory pages be clean all the time.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:29 -08:00
Jaegeuk Kim 02a1335f25 f2fs: support volatile operations for transient data
This patch adds support for volatile writes which keep data pages in memory
until f2fs_evict_inode is called by iput.

For instance, we can use this feature for the sqlite database as follows.
While supporting atomic writes for main database file, we can keep its journal
data temporarily in the page cache by the following sequence.

1. open
 -> ioctl(F2FS_IOC_START_VOLATILE_WRITE);
2. writes
 : keep all the data in the page cache.
3. flush to the database file with atomic writes
  a. ioctl(F2FS_IOC_START_ATOMIC_WRITE);
  b. writes
  c. ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
4. close
 -> drop the cached data

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-10-07 11:54:41 -07:00
Jaegeuk Kim 88b88a6679 f2fs: support atomic writes
This patch introduces a very limited functionality for atomic write support.
In order to support atomic write, this patch adds two ioctls:
 o F2FS_IOC_START_ATOMIC_WRITE
 o F2FS_IOC_COMMIT_ATOMIC_WRITE

The database engine should be aware of the following sequence.
1. open
 -> ioctl(F2FS_IOC_START_ATOMIC_WRITE);
2. writes
  : all the written data will be treated as atomic pages.
3. commit
 -> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
  : this flushes all the data blocks to the disk, which will be shown all or
  nothing by f2fs recovery procedure.
4. repeat to #2.

The IO pattens should be:

  ,- START_ATOMIC_WRITE                  ,- COMMIT_ATOMIC_WRITE
 CP | D D D D D D | FSYNC | D D D D | FSYNC ...
                      `- COMMIT_ATOMIC_WRITE

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-10-06 17:39:50 -07:00
Jaegeuk Kim 120c2cba1d f2fs: remove unused return value
Don't return any value without any usage.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-10-05 21:05:15 -07:00
Jaegeuk Kim 52656e6cf7 f2fs: clean up f2fs_ioctl functions
This patch cleans up f2fs_ioctl functions for better readability.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:34:56 -07:00
Dan Carpenter 8a21984d5d f2fs: potential shift wrapping buf in f2fs_trim_fs()
My static checker complains that segment is a u64 but only the lower 31
bits can be used before we hit a shift wrapping bug.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:34:56 -07:00
Jaegeuk Kim 44c1615651 f2fs: call f2fs_unlock_op after error was handled
This patch relocates f2fs_unlock_op in every directory operations to be called
after any error was processed.
Otherwise, the checkpoint can be entered with valid node ids without its
dentry when -ENOSPC is occurred.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:34:55 -07:00
Jaegeuk Kim 7cd8558baa f2fs: check the use of macros on block counts and addresses
This patch cleans up the existing and new macros for readability.

Rule is like this.

         ,-----------------------------------------> MAX_BLKADDR -,
         |  ,------------- TOTAL_BLKS ----------------------------,
         |  |                                                     |
         |  ,- seg0_blkaddr   ,----- sit/nat/ssa/main blkaddress  |
block    |  | (SEG0_BLKADDR)  | | | |   (e.g., MAIN_BLKADDR)      |
address  0..x................ a b c d .............................
            |                                                     |
global seg# 0...................... m .............................
            |                       |                             |
            |                       `------- MAIN_SEGS -----------'
            `-------------- TOTAL_SEGS ---------------------------'
                                    |                             |
 seg#                               0..........xx..................

= Note =
 o GET_SEGNO_FROM_SEG0 : blk address -> global segno
 o GET_SEGNO           : blk address -> segno
 o START_BLOCK         : segno -> starting block address

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:34:47 -07:00
Jaegeuk Kim 309cc2b6e7 f2fs: refactor flush_nat_entries to remove costly reorganizing ops
Previously, f2fs tries to reorganize the dirty nat entries into multiple sets
according to its nid ranges. This can improve the flushing nat pages, however,
if there are a lot of cached nat entries, it becomes a bottleneck.

This patch introduces a new set management flow by removing dirty nat list and
adding a series of set operations when the nat entry becomes dirty.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:30:41 -07:00
Jaegeuk Kim 4b2fecc846 f2fs: introduce FITRIM in f2fs_ioctl
This patch introduces FITRIM in f2fs_ioctl.
In this case, f2fs will issue small discards and prefree discards as many as
possible for the given area.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:06:09 -07:00
Jaegeuk Kim 75ab4cb830 f2fs: introduce cp_control structure
This patch add a new data structure to control checkpoint parameters.
Currently, it presents the reason of checkpoint such as is_umount and normal
sync.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:01:28 -07:00
Jaegeuk Kim 95dd897301 f2fs: use more free segments until SSR is activated
Previously, f2fs activates SSR if the # of free segments reaches to the # of
overprovisioned segments.
In this case, SSR starts to use dirty segments only, so that the overprovisoned
space cannot be selected for new data.
This means that we have no chance to utilizae the overprovisioned space at all.

This patch fixes that by allowing LFS allocations until the # of free segments
reaches to the last threshold, reserved space.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:24 -07:00
Jaegeuk Kim 9b5f136fd4 f2fs: change the ipu_policy option to enable combinations
This patch changes the ipu_policy setting to use any combination of orthogonal policies.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:24 -07:00
Chao Yu 210f41bc04 f2fs: fix to search whole dirty segmap when get_victim
In ->get_victim we get max_search value from dirty_i->nr_dirty without
protection of seglist_lock, after that, nr_dirty can be increased/decreased
before we hold seglist_lock lock.
Then in main loop we attempt to traverse all dirty section one time to find
victim section, but it's not accurate to use max_search as the total loop count,
because we might lose checking several sections or check sections redundantly
for the case of nr_dirty are increased or decreased previously.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:23 -07:00
Chao Yu 26666c8a43 f2fs: fix to clean previous mount option when remount_fs
In manual of mount, we descript remount as below:

"mount -o remount,rw /dev/foo /dir
After  this call all old mount options are replaced and arbitrary stuff from
fstab is ignored, except the loop= option which is internally generated and
maintained by the mount command."

Previously f2fs do not clear up old mount options when remount_fs, so we have no
chance of disabling previous option (e.g. flush_merge). Fix it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:22 -07:00
Chao Yu 14cecc5cd6 f2fs: skip punching hole in special condition
Now punching hole in directory is not supported in f2fs, so let's limit file
type in punch_hole().

In addition, in punch_hole if offset is exceed file size, we should skip
punching hole.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:21 -07:00
Chao Yu 55cf9cb63f f2fs: support large sector size
Block size in f2fs is 4096 bytes, so theoretically, f2fs can support 4096 bytes
sector device at maximum. But now f2fs only support 512 bytes size sector, so
block device such as zRAM which uses page cache as its block storage space will
not be mounted successfully as mismatch between sector size of zRAM and sector
size of f2fs supported.

In this patch we support large sector size in f2fs, so block device with sector
size of 512/1024/2048/4096 bytes can be supported in f2fs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:20 -07:00
Chao Yu 09db6a2ef8 f2fs: fix to truncate blocks past EOF in ->setattr
By using FALLOC_FL_KEEP_SIZE in ->fallocate of f2fs, we can fallocate block past
EOF without changing i_size of inode. These blocks past EOF will not be
truncated in ->setattr as we truncate them only when change the file size.

We should give a chance to truncate blocks out of filesize in setattr().

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:20 -07:00
Jaegeuk Kim 976e4c50ae f2fs: update i_size when __allocate_data_block
The f2fs_direct_IO uses __allocate_data_block, but inside the allocation path,
we should update i_size at the changed time to update its inode page.
Otherwise, we can get wrong i_size after roll-forward recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:19 -07:00
Jaegeuk Kim 90a893c749 f2fs: use MAX_BIO_BLOCKS(sbi)
This patch cleans up a simple macro.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:18 -07:00
Jaegeuk Kim c52e1b10b1 f2fs: remove redundant operation during roll-forward recovery
If same data is updated multiple times, we don't need to redo whole the
operations.
Let's just update the lastest one.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:17 -07:00
Jaegeuk Kim 19c9c466e5 f2fs: do not skip latest inode information
In f2fs_sync_file, if there is no written appended writes, it skips
to write its node blocks.
But, if there is up-to-date inode page, we should write it to update
its metadata during the roll-forward recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:16 -07:00
Jaegeuk Kim 441ac5cb32 f2fs: fix roll-forward missing scenarios
We can summarize the roll forward recovery scenarios as follows.

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
-> Update the latest inode(x).

2. inode(x) | CP | inode(F) | dnode(F)
-> No problem.

3. inode(x) | CP | dnode(F) | inode(x)
-> Recover to the latest dnode(F), and drop the last inode(x)

4. inode(x) | CP | dnode(F) | inode(F)
-> No problem.

5. CP | inode(x) | dnode(F)
-> The inode(DF) was missing. Should drop this dnode(F).

6. CP | inode(DF) | dnode(F)
-> No problem.

7. CP | dnode(F) | inode(DF)
-> If f2fs_iget fails, then goto next to find inode(DF).

8. CP | dnode(F) | inode(x)
-> If f2fs_iget fails, then goto next to find inode(DF).
   But it will fail due to no inode(DF).

So, this patch adds some missing points such as #1, #5, #7, and #8.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:16 -07:00
Jaegeuk Kim 88bd02c947 f2fs: fix conditions to remain recovery information in f2fs_sync_file
This patch revisited whole the recovery information during the f2fs_sync_file.

In this patch, there are three information to make a decision.

a) IS_CHECKPOINTED,	/* is it checkpointed before? */
b) HAS_FSYNCED_INODE,	/* is the inode fsynced before? */
c) HAS_LAST_FSYNC,	/* has the latest node fsync mark? */

And, the scenarios for our rule are based on:

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
2. inode(x) | CP | inode(F) | dnode(F)
3. inode(x) | CP | dnode(F) | inode(x) | inode(F)
4. inode(x) | CP | dnode(F) | inode(F)
5. CP | inode(x) | dnode(F) | inode(DF)
6. CP | inode(DF) | dnode(F)
7. CP | dnode(F) | inode(DF)
8. CP | dnode(F) | inode(x) | inode(DF)

For example, #3, the three conditions should be changed as follows.

   inode(x) | CP | dnode(F) | inode(x) | inode(F)
a)    x       o      o          o          o
b)    x       x      x          x          o
c)    x       o      o          x          o

If f2fs_sync_file stops   ------^,
 it should write inode(F)    --------------^

So, the need_inode_block_update should return true, since
 c) get_nat_flag(e, HAS_LAST_FSYNC), is false.

For example, #8,
      CP | alloc | dnode(F) | inode(x) | inode(DF)
a)    o      x        x          x          x
b)    x               x          x          o
c)    o               o          x          o

If f2fs_sync_file stops   -------^,
 it should write inode(DF)    --------------^

Note that, the roll-forward policy should follow this rule, which means,
if there are any missing blocks, we doesn't need to recover that inode.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:15 -07:00
Jaegeuk Kim 7ef35e3b9e f2fs: introduce a flag to represent each nat entry information
This patch introduces a flag in the nat entry structure to merge various
information such as checkpointed and fsync_done marks.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:14 -07:00
Jaegeuk Kim 4c521f493b f2fs: use meta_inode cache to improve roll-forward speed
Previously, all the dnode pages should be read during the roll-forward recovery.
Even worsely, whole the chain was traversed twice.
This patch removes that redundant and costly read operations by using page cache
of meta_inode and readahead function as well.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:12 -07:00
Jaegeuk Kim 60979115a6 f2fs: fix double lock for inode page during roll-foward recovery
If the inode is same and its data index are needed to truncate, we can fall into
double lock for its inode page via get_dnode_of_data.

Error case is like this.

1. write data 1, 2, 3, 4, 5 in inode #4.
2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4.
3. sync
4. update data 100->106 in dnode #6.
5. fsync inode #4.
6. power-cut

-> Then,
1. go back to #3's checkpoint
2. in do_recover_data, get_dnode_of_data() gets inode #4.
3. detect 100->106 in dnode #6.
4. check_index_in_prev_nodes tries to truncate 100 in dnode #6.
5. to trigger truncate_hole, get_dnode_of_data should grab inode #4.
6. detect *kernel hang*

This patch should resolve that bug.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-16 04:10:47 -07:00
Huang Ying c6e489305e f2fs: fix a race condition in next_free_nid
The nm_i->fcnt checking is executed before spin_lock, so if another
thread delete the last free_nid from the list, the wrong nid may be
gotten.  So fix the race condition by moving the nm_i->fnct checking
into spin_lock.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-16 04:10:46 -07:00
Huang Ying 7704182387 f2fs: use nm_i->next_scan_nid as default for next_free_nid
Now, if there is no free nid in nm_i->free_nid_list, 0 may be saved
into next_free_nid of checkpoint, this may cause useless scanning for
next mount.  nm_i->next_scan_nid should be a better default value than
0.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-16 04:10:45 -07:00
Jaegeuk Kim c1ce1b02bb f2fs: give an option to enable in-place-updates during fsync to users
If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
only starts to try in-place-updates.
And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
keeps out-of-order manner. Otherwise, it triggers in-place-updates.

This may be used by storage showing very high random write performance.

For example, it can be used when,

Seq. writes (Data) + wait + Seq. writes (Node)

is pretty much slower than,

Rand. writes (Data)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-16 04:10:44 -07:00