Commit Graph

163 Commits

Author SHA1 Message Date
Chao Yu 675f10bde6 f2fs: fix to convert inline directory correctly
With below serials, we will lose parts of dirents:

1) mount f2fs with inline_dentry option
2) echo 1 > /sys/fs/f2fs/sdX/dir_level
3) mkdir dir
4) touch 180 files named [1-180] in dir
5) touch 181 in dir
6) echo 3 > /proc/sys/vm/drop_caches
7) ll dir

ls: cannot access 2: No such file or directory
ls: cannot access 4: No such file or directory
ls: cannot access 5: No such file or directory
ls: cannot access 6: No such file or directory
ls: cannot access 8: No such file or directory
ls: cannot access 9: No such file or directory
...
total 360
drwxr-xr-x 2 root root 4096 Feb 19 15:12 ./
drwxr-xr-x 3 root root 4096 Feb 19 15:11 ../
-rw-r--r-- 1 root root    0 Feb 19 15:12 1
-rw-r--r-- 1 root root    0 Feb 19 15:12 10
-rw-r--r-- 1 root root    0 Feb 19 15:12 100
-????????? ? ?    ?       ?            ? 101
-????????? ? ?    ?       ?            ? 102
-????????? ? ?    ?       ?            ? 103
...

The reason is: when doing the inline dir conversion, we didn't consider
that directory has hierarchical hash structure which can be configured
through sysfs interface 'dir_level'.

By default, dir_level of directory inode is 0, it means we have one bucket
in hash table located in first level, all dirents will be hashed in this
bucket, so it has no problem for us to do the duplication simply between
inline dentry page and converted normal dentry page.

However, if we configured dir_level with the value N (greater than 0), it
will expand the bucket number of first level hash table by 2^N - 1, it
hashs dirents into different buckets according their hash value, if we
still move all dirents to first bucket, it makes incorrent locating for
inline dirents, the result is, although we can iterate all dirents through
->readdir, we can't stat some of them in ->lookup which based on hash
table searching.

This patch fixes this issue by rehashing dirents into correct position
when converting inline directory.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15 08:49:47 -07:00
Kirill A. Shutemov 09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Chao Yu 291bf80bec f2fs: clean up opened code with f2fs_update_dentry
Just clean up opened code with existing function, no logic change.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:45 -07:00
Jaegeuk Kim 8074bb5150 f2fs crypto: sync ext4_lookup and ext4_file_open
This patch tries to catch up with lookup and open policies in ext4.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:42 -07:00
Jaegeuk Kim 0b81d07790 fs crypto: move per-file encryption from f2fs tree to fs/crypto
This patch adds the renamed functions moved from the f2fs crypto files.

1. definitions for per-file encryption used by ext4 and f2fs.

2. crypto.c for encrypt/decrypt functions
 a. IO preparation:
  - fscrypt_get_ctx / fscrypt_release_ctx
 b. before IOs:
  - fscrypt_encrypt_page
  - fscrypt_decrypt_page
  - fscrypt_zeroout_range
 c. after IOs:
  - fscrypt_decrypt_bio_pages
  - fscrypt_pullback_bio_page
  - fscrypt_restore_control_page

3. policy.c supporting context management.
 a. For ioctls:
  - fscrypt_process_policy
  - fscrypt_get_policy
 b. For context permission
  - fscrypt_has_permitted_context
  - fscrypt_inherit_context

4. keyinfo.c to handle permissions
  - fscrypt_get_encryption_info
  - fscrypt_free_encryption_info

5. fname.c to support filename encryption
 a. general wrapper functions
  - fscrypt_fname_disk_to_usr
  - fscrypt_fname_usr_to_disk
  - fscrypt_setup_filename
  - fscrypt_free_filename

 b. specific filename handling functions
  - fscrypt_fname_alloc_buffer
  - fscrypt_fname_free_buffer

6. Makefile and Kconfig

Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Uday Savagaonkar <savagaon@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:33 -07:00
Chao Yu ed3360abbc f2fs crypto: make sure the encryption info is initialized on opendir(2)
This patch syncs f2fs with commit 6bc445e0ff ("ext4 crypto: make
sure the encryption info is initialized on opendir(2)") from ext4.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim 7d9dfa1dd7 f2fs: avoid garbage lenghs in dentries
This patch fixes to eliminate garbage name lengths in dentries in order
to provide correct answers of readdir.

For example, if a valid dentry consists of:
 bitmap : 1   1 1 1
 len    : 32  0 x 0,

readdir can start with second bit_pos having len = 0.
Or, it can start with third bit_pos having garbage.

In both of cases, we should avoid to try filling dentries.
So, this patch not only removes any garbage length, but also avoid entering
zero length case in readdir.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim fec1d6576c f2fs: use wait_for_stable_page to avoid contention
In write_begin, if storage supports stable_page, we don't need to wait for
writeback to update its contents.
This patch introduces to use wait_for_stable_page instead of
wait_on_page_writeback.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim d0239e1bf5 f2fs: detect idle time depending on user behavior
This patch adds last time that user requested filesystem operations.
This information is used to detect whether system is idle or not later.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-11 15:56:37 -08:00
Jaegeuk Kim 1f6fa26199 f2fs: remove f2fs_bug_on in terms of max_depth
There is no report on this bug_on case, but if malicious attacker changed this
field intentionally, we can just reset it as a MAX value.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31 11:42:46 -08:00
Chao Yu c227f91273 f2fs: record dirty status of regular/symlink inode
Maintain regular/symlink inode which has dirty pages in global dirty list
and record their total dirty pages count like the way of handling directory
inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-16 08:58:12 -08:00
Chao Yu e9837bc2a4 f2fs: clean up error path in f2fs_readdir
No logic changes, just clean up the error path.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 12:07:55 -08:00
Chao Yu 57b62d29ad f2fs: fix to report error in f2fs_readdir
get_lock_data_page in f2fs_readdir can fail due to a lot of reasons (i.e.
no memory or IO error...), it's better to report this kind of error to
user rather than ignoring it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 11:52:35 -08:00
Jaegeuk Kim a56c7c6fb3 f2fs: set GFP_NOFS for grab_cache_page
For normal inodes, their pages are allocated with __GFP_FS, which can cause
filesystem calls when reclaiming memory.
This can incur a dead lock condition accordingly.

So, this patch addresses this problem by introducing
f2fs_grab_cache_page(.., bool for_write), which calls
grab_cache_page_write_begin() with AOP_FLAG_NOFS.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12 13:38:03 -07:00
Jaegeuk Kim 569cf1876a f2fs crypto: allocate buffer for decrypting filename
We got dentry pages from high_mem, and its address space directly goes into the
decryption path via f2fs_fname_disk_to_usr.
But, sg_init_one assumes the address is not from high_mem, so we can get this
panic since it doesn't call kmap_high but kunmap_high is triggered at the end.

kernel BUG at ../../../../../../kernel/mm/highmem.c:290!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
 (kunmap_high+0xb0/0xb8) from [<c0114534>] (__kunmap_atomic+0xa0/0xa4)
 (__kunmap_atomic+0xa0/0xa4) from [<c035f028>] (blkcipher_walk_done+0x128/0x1ec)
 (blkcipher_walk_done+0x128/0x1ec) from [<c0366c24>] (crypto_cbc_decrypt+0xc0/0x170)
 (crypto_cbc_decrypt+0xc0/0x170) from [<c0367148>] (crypto_cts_decrypt+0xc0/0x114)
 (crypto_cts_decrypt+0xc0/0x114) from [<c035ea98>] (async_decrypt+0x40/0x48)
 (async_decrypt+0x40/0x48) from [<c032ca34>] (f2fs_fname_disk_to_usr+0x124/0x304)
 (f2fs_fname_disk_to_usr+0x124/0x304) from [<c03056fc>] (f2fs_fill_dentries+0xac/0x188)
 (f2fs_fill_dentries+0xac/0x188) from [<c03059c8>] (f2fs_readdir+0x1f0/0x300)
 (f2fs_readdir+0x1f0/0x300) from [<c0218054>] (vfs_readdir+0x90/0xb4)
 (vfs_readdir+0x90/0xb4) from [<c0218418>] (SyS_getdents64+0x64/0xcc)
 (SyS_getdents64+0x64/0xcc) from [<c0105ba0>] (ret_fast_syscall+0x0/0x30)

Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:51 -07:00
Chao Yu 206e61be29 f2fs: avoid clear valid page
In f2fs_delete_entry, if last dirent is remove from the dentry page,
we will try to punch that page since it has no valid date in it.

But truncate_hole which is used for punching could fail because of
no memory or IO error, if that happened, we'd better skip clearing
this valid dentry page.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-20 09:00:06 -07:00
Jaegeuk Kim 26bf3dc7e2 f2fs crypto: use per-inode tfm structure
This patch applies the following ext4 patch:

  ext4 crypto: use per-inode tfm structure

As suggested by Herbert Xu, we shouldn't allocate a new tfm each time
we read or write a page.  Instead we can use a single tfm hanging off
the inode's crypt_info structure for all of our encryption needs for
that inode, since the tfm can be used by multiple crypto requests in
parallel.

Also use cmpxchg() to avoid races that could result in crypt_info
structure getting doubly allocated or doubly freed.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:04 -07:00
hujianyang 08b95126c7 f2fs: add compat_ioctl to provide backward compatability
introduce compat_ioctl to regular files, but doesn't add this
functionality to f2fs_dir_operations.

While running a 32-bit busybox, I met an error like this:
(A is a directory)

chattr: reading flags on A: Inappropriate ioctl for device

This patch copies compat_ioctl from f2fs_file_operations and
fix this problem.

Signed-off-by: hujianyang <hujianyang@huawei.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:20:51 -07:00
Jaegeuk Kim e7d5545285 f2fs crypto: add filename encryption for roll-forward recovery
This patch adds a bit flag to indicate whether or not i_name in the inode
is encrypted.

If this name is encrypted, we can't do recover_dentry during roll-forward.
So, f2fs_sync_file() needs to do checkpoint, if this will be needed in future.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:55 -07:00
Jaegeuk Kim 6e22c691ba f2fs crypto: add filename encryption for f2fs_lookup
This patch implements filename encryption support for f2fs_lookup.

Note that, f2fs_find_entry should be outside of f2fs_(un)lock_op().

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:54 -07:00
Jaegeuk Kim d8c6822a05 f2fs crypto: add filename encryption for f2fs_readdir
This patch implements filename encryption support for f2fs_readdir.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:53 -07:00
Jaegeuk Kim 9ea97163c6 f2fs crypto: add filename encryption for f2fs_add_link
This patch adds filename encryption support for f2fs_add_link.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:52 -07:00
Jaegeuk Kim fcc85a4d86 f2fs crypto: activate encryption support for fs APIs
This patch activates the following APIs for encryption support.

The rules quoted by ext4 are:
 - An unencrypted directory may contain encrypted or unencrypted files
   or directories.
 - All files or directories in a directory must be protected using the
   same key as their containing directory.
 - Encrypted inode for regular file should not have inline_data.
 - Encrypted symlink and directory may have inline_data and inline_dentry.

This patch activates the following APIs.
1. f2fs_link              : validate context
2. f2fs_lookup            :      ''
3. f2fs_rename            :      ''
4. f2fs_create/f2fs_mkdir : inherit its dir's context
5. f2fs_direct_IO         : do buffered io for regular files
6. f2fs_open              : check encryption info
7. f2fs_file_mmap         :      ''
8. f2fs_setattr           :      ''
9. f2fs_file_write_iter   :      ''           (Called by sys_io_submit)
10. f2fs_fallocate        : do not support fcollapse
11. f2fs_evict_inode      : free_encryption_info

Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:51 -07:00
Jaegeuk Kim 43f3eae1d3 f2fs: split find_data_page according to specific purposes
This patch splits find_data_page as follows.

1. f2fs_gc
 - use get_read_data_page() with read only

2. find_in_level
 - use find_data_page without locked page

3. truncate_partial_page
 - In the case cache_only mode, just drop cached page.
 - Ohterwise, use get_lock_data_page() and guarantee to truncate

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:37 -07:00
Jaegeuk Kim cb58463bc9 f2fs: assign parent's i_mode for empty dir
When assigning i_mode for dotdot, it needs to assign parent's i_mode.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:58 -07:00
Jaegeuk Kim 510022a858 f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries
If f2fs was corrupted with missing dot dentries, it needs to recover them after
fsck.f2fs detection.

The underlying precedure is:

1. The fsck.f2fs remains F2FS_INLINE_DOTS flag in directory inode, if it detects
missing dot dentries.

2. When f2fs looks up the corrupted directory, it triggers f2fs_add_link with
proper inode numbers and their dot and dotdot names.

3. Once f2fs recovers the directory without errors, it removes F2FS_INLINE_DOTS
finally.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:57 -07:00
Chao Yu bda190760b f2fs: fix to calculate max length of contiguous free slots correctly
When lookuping for creating, we will try to record the level of current dentry
hash table if current dentry has enough contiguous slots for storing name of new
file which will be created later, this can save our lookup time when add a link
into parent dir.

But currently in find_target_dentry, our current length of contiguous free slots
is not calculated correctly. This make us leaving some holes in dentry block
occasionally, it wastes our space of dentry block.

Let's refactor the lookup flow for max slots as following to fix this issue:
a) increase max_len if current slot is free;
b) update max_slots with max_len if max_len is larger than max_slots;
c) reset max_len to zero if current slot is not free.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:40 -07:00
Yuan Zhong d9f46bb1a8 f2fs: remove unnecessary condition judgment
Remove the unnecessary condition judgment, because
'max_slots' has been initialized to '0' at the beginging
of the function, as following:
if (max_slots)
       *max_slots = 0;

Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:38 -07:00
Yuan Zhong b1f73b79d2 f2fs: set the correct place of initializing *res_page
The function 'find_in_inline_dir()' contain 'res_page'
as an argument. So, we should initiaize 'res_page' before
this function.

Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:37 -07:00
Jaegeuk Kim 2bca1e2388 f2fs: clear page's up-to-date if block was deallocated
If page's on-disk block was deallocated, let's remove up-to-date flag to avoid
further access with wrong contents.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:26 -07:00
Chao Yu 3b4d732a56 f2fs: introduce f2fs_update_dentry to clean up duplicated codes
This patch introduces f2fs_update_dentry to remove redundant code in
f2fs_add_inline_entry and __f2fs_add_link.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:44 -08:00
Jaegeuk Kim 5df1f1da7a f2fs: use missing the use of f2fs_kunmap_page
This patch calls f2fs_kunmap_page which I missed before.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:22 -08:00
Jaegeuk Kim 9486ba442b f2fs: introduce f2fs_dentry_kunmap to clean up
This patch introduces f2fs_dentry_kunmap to clean up dirty codes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-23 21:51:53 -08:00
Jan Kara 1f7732fe6c f2fs: remove pointless bit testing in f2fs_delete_entry()
There's no point in using test_and_clear_bit_le() when we don't use the
return value of the function. Just use clear_bit_le() instead.

Coverity-id: 1016434
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:38 -08:00
Jaegeuk Kim 062a3e7ba7 f2fs: reuse make_empty_dir code for inline_dentry
This patch introduces do_make_empty_dir to mitigate code redundancy
for inline_dentry.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:34 -08:00
Jaegeuk Kim 7b3cd7d6f0 f2fs: introduce f2fs_dentry_ptr structure for code clean-up
This patch introduces f2fs_dentry_ptr structure for the use of a function
parameter in inline_dentry operations.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:34 -08:00
Jaegeuk Kim 38594de767 f2fs: reuse core function in f2fs_readdir for inline_dentry
This patch introduces a core function, f2fs_fill_dentries, to remove
redundant code in f2fs_readdir and f2fs_read_inline_dir.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:34 -08:00
Jaegeuk Kim bce8d11207 f2fs: avoid deadlock on init_inode_metadata
Previously, init_inode_metadata does not hold any parent directory's inode
page. So, f2fs_init_acl can grab its parent inode page without any problem.
But, when we use inline_dentry, that page is grabbed during f2fs_add_link,
so that we can fall into deadlock condition like below.

INFO: task mknod:11006 blocked for more than 120 seconds.
      Tainted: G           OE  3.17.0-rc1+ #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mknod           D ffff88003fc94580     0 11006  11004 0x00000000
 ffff880007717b10 0000000000000002 ffff88003c323220 ffff880007717fd8
 0000000000014580 0000000000014580 ffff88003daecb30 ffff88003c323220
 ffff88003fc94e80 ffff88003ffbb4e8 ffff880007717ba0 0000000000000002
Call Trace:
 [<ffffffff8173dc40>] ? bit_wait+0x50/0x50
 [<ffffffff8173d4cd>] io_schedule+0x9d/0x130
 [<ffffffff8173dc6c>] bit_wait_io+0x2c/0x50
 [<ffffffff8173da3b>] __wait_on_bit_lock+0x4b/0xb0
 [<ffffffff811640a7>] __lock_page+0x67/0x70
 [<ffffffff810acf50>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff811652cc>] pagecache_get_page+0x14c/0x1e0
 [<ffffffffa029afa9>] get_node_page+0x59/0x130 [f2fs]
 [<ffffffffa02a63ad>] read_all_xattrs+0x24d/0x430 [f2fs]
 [<ffffffffa02a6ca2>] f2fs_getxattr+0x52/0xe0 [f2fs]
 [<ffffffffa02a7481>] f2fs_get_acl+0x41/0x2d0 [f2fs]
 [<ffffffff8122d847>] get_acl+0x47/0x70
 [<ffffffff8122db5a>] posix_acl_create+0x5a/0x150
 [<ffffffffa02a7759>] f2fs_init_acl+0x29/0xcb [f2fs]
 [<ffffffffa0286a8d>] init_inode_metadata+0x5d/0x340 [f2fs]
 [<ffffffffa029253a>] f2fs_add_inline_entry+0x12a/0x2e0 [f2fs]
 [<ffffffffa0286ea5>] __f2fs_add_link+0x45/0x4a0 [f2fs]
 [<ffffffffa028b5b6>] ? f2fs_new_inode+0x146/0x220 [f2fs]
 [<ffffffffa028b816>] f2fs_mknod+0x86/0xf0 [f2fs]
 [<ffffffff811e3ec1>] vfs_mknod+0xe1/0x160
 [<ffffffff811e4b26>] SyS_mknod+0x1f6/0x200
 [<ffffffff81741d7f>] tracesys+0xe1/0xe6

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:33 -08:00
Jaegeuk Kim 59a0615540 f2fs: fix to wait correct block type
The inode page needs to wait NODE block io.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:33 -08:00
Jaegeuk Kim 4e6ebf6d49 f2fs: reuse find_in_block code for find_in_inline_dir
This patch removes redundant copied code in find_in_inline_dir.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:32 -08:00
Jaegeuk Kim a82afa2019 f2fs: reuse room_for_filename for inline dentry operation
This patch introduces to reuse the existing room_for_filename for inline dentry
operation.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:32 -08:00
Chao Yu 622f28ae9b f2fs: enable inline dir handling
Add inline dir functions into normal dir ops' function to handle inline ops.
Besides, we enable inline dir mode when a new dir inode is created if
inline_data option is on.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:32 -08:00
Chao Yu dbeacf02eb f2fs: export dir operations for inline dir
This patch exports some dir operations for inline dir, additionally introduces
f2fs_drop_nlink from f2fs_delete_entry for reusing by inline dir function.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:31 -08:00
Jaegeuk Kim a7ffdbe22c f2fs: expand counting dirty pages in the inode page cache
Previously f2fs only counts dirty dentry pages, but there is no reason not to
expand the scope.

This patch changes the names on the management of dirty pages and to count
dirty pages in each inode info as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-16 04:10:39 -07:00
Jaegeuk Kim 9850cf4a89 f2fs: need fsck.f2fs when f2fs_bug_on is triggered
If any f2fs_bug_on is triggered, fsck.f2fs is needed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:02 -07:00
Jaegeuk Kim 4081363fbe f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB
This patch adds three inline functions to clean up dirty casting codes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-03 17:37:13 -07:00
Jaegeuk Kim 764aa3e978 f2fs: avoid double lock in truncate_blocks
The init_inode_metadata calls truncate_blocks when error is occurred.
The callers holds f2fs_lock_op, so we should not call it again in
truncate_blocks.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:57:01 -07:00
arter97 e1c4204520 f2fs: fix typo
Fix typo and some grammatical errors.

The words "filesystem" and "readahead" are being used without the space treewide.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:33 -07:00
Chao Yu 81e366f87f f2fs: check name_len of dir entry to prevent from deadloop
We assume that modification of some special application could result in zeroed
name_len, or it is consciously made by somebody. We will deadloop in
find_in_block when name_len of dir entry is zero.

This patch is added for preventing deadloop in above scenario.

change log from v1:
 o use f2fs_bug_on rather than break out from searching dir entry suggested by
Jaegeuk Kim.

Jaegeuk describe:
"Well, IMO, it would be good to add f2fs_bug_on() here with a specific comment.
In the current phase of f2fs, it is more important to investigate the file
system bugs, rather than workarounds for any corrupted images.
And, definitely it needs to stop the kernel if any corrupted image was mounted,
so that we can figure out where the bugs are occurred."

Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-10 17:00:02 -07:00
Gu Zheng eee6160f2e f2fs: arguments cleanup of finding file flow functions
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:26 -07:00
Gu Zheng 1c3bb97899 f2fs: remove the needless point-cast
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:26 -07:00
Jaegeuk Kim a014e037be f2fs: clean up an unused parameter and assignment
This patch cleans up simple unnecessary codes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:25 -07:00
Jaegeuk Kim b97a9b5da8 f2fs: introduce f2fs_do_tmpfile for code consistency
This patch adds f2fs_do_tmpfile to eliminate the redundant init_inode_metadata
flow.
Throught this, we can provide the consistent lock usage, e.g., fi->i_sem,  and
this will enable better debugging stuffs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:24 -07:00
Chao Yu 50732df02e f2fs: support ->tmpfile()
Add function f2fs_tmpfile() to support O_TMPFILE file creation, and modify logic
of init_inode_metadata to enable linkat temp file.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:24 -07:00
Chao Yu 90d72459cc f2fs: fix error path in init_inode_metadata
If we fail in this path:
->init_inode_metadata
  ->make_empty_dir
    ->get_new_data_page
      ->grab_cache_page return -ENOMEM

We will bug on in error path of init_inode_metadata when call remove_inode_page
because i_block = 2 (one inode block will be released later & one dentry block).

We should release the dentry block in init_inode_metadata to avoid this BUG_ON,
and avoid leak of dentry block resource, because we never have second chance to
release that block in ->evict_inode as in upper error path we make this inode
'bad'.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 05:58:50 -07:00
Chao Yu bfec07d0f8 f2fs: avoid overflow when large directory feathure is enabled
When large directory feathure is enable, We have one case which could cause
overflow in dir_buckets() as following:
special case: level + dir_level >= 32 and level < MAX_DIR_HASH_DEPTH / 2.

Here we define MAX_DIR_BUCKETS to limit the return value when the condition
could trigger potential overflow.

Changes from V1
 o modify description of calculation in f2fs.txt suggested by Changman Lee.

Suggested-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-04 13:34:30 +09:00
Jaegeuk Kim 54b591dfda f2fs: split grab_cache_page and wait_on_page_writeback for node pages
This patch splits grab_cache_page_write_begin into grab_cache_page and
wait_on_page_writeback for node pages.

This patch intends to enhance the latency to get node pages by alleviating
unnecessary wait_on_page_writeback.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:58 +09:00
Chao Yu 817202d937 f2fs: readahead multi pages of directory for performance
We have no so such readahead mechanism in ->iterate() path as the one in
->read() path, it cause low performance when we read large directory.
This patch add readahead in f2fs_readdir() for better performance.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:57 +09:00
Jaegeuk Kim d928bfbfe7 f2fs: introduce fi->i_sem to protect fi's info
This patch introduces fi->i_sem to protect fi's info that includes xattr_ver,
pino, i_nlink.
This enables to remove i_mutex during f2fs_sync_file, resulting in performance
improvement when a number of fsync calls are triggered from many concurrent
threads.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20 22:10:11 +09:00
Jaegeuk Kim 3cb5ad152b f2fs: call f2fs_wait_on_page_writeback instead of native function
If a page is on writeback, f2fs can face with deadlock due to under writepages.
This is caused by merging IOs inside f2fs, so if it comes to detect, let's throw
merged IOs, which is implemented by f2fs_wait_on_page_writeback.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20 22:10:04 +09:00
Jaegeuk Kim 20f70751c6 f2fs: fix wrong kernel coding style
This patch includes a simple fix to adjust coding style.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-05 10:48:53 +09:00
Jaegeuk Kim 3843154598 f2fs: introduce large directory support
This patch introduces an i_dir_level field to support large directory.

Previously, f2fs maintains multi-level hash tables to find a dentry quickly
from a bunch of chiild dentries in a directory, and the hash tables consist of
the following tree structure as below.

In Documentation/filesystems/f2fs.txt,

----------------------
A : bucket
B : block
N : MAX_DIR_HASH_DEPTH
----------------------

level #0   | A(2B)
           |
level #1   | A(2B) - A(2B)
           |
level #2   | A(2B) - A(2B) - A(2B) - A(2B)
     .     |   .       .       .       .
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
     .     |   .       .       .       .
level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)

But, if we can guess that a directory will handle a number of child files,
we don't need to traverse the tree from level #0 to #N all the time.
Since the lower level tables contain relatively small number of dentries,
the miss ratio of the target dentry is likely to be high.

In order to avoid that, we can configure the hash tables sparsely from level #0
like this.

level #0   | A(2B) - A(2B) - A(2B) - A(2B)

level #1   | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
     .     |   .       .       .       .
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
     .     |   .       .       .       .
level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)

With this structure, we can skip the ineffective tree searches in lower level
hash tables.

This patch adds just a facility for this by introducing i_dir_level in
f2fs_inode.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27 19:56:09 +09:00
Jaegeuk Kim 5d0c667121 f2fs: remove costly bit operations for f2fs_find_entry
It turns out that a bit operation like find_next_bit is not always fast enough
for f2fs_find_entry.
Instead, it is pretty much simple and fast to traverse each dentries.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27 16:25:20 +09:00
Jaegeuk Kim 1fe54f9dd3 f2fs: clean up redundant function call
This patch integrates inode_[inc|dec]_dirty_dents with inc_page_count to remove
redundant calls.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:53 +09:00
Jaegeuk Kim bd859c6598 f2fs: fix to truncate dentry pages in the error case
When a new directory is allocated, if an error is occurred, we should truncate
preallocated dentry pages too.

This bug was reported by Andrey Tsyvarev after a while as follows.

mkdir()->
 f2fs_add_link()->
  init_inode_metadata()->
    f2fs_init_acl()->
      f2fs_get_acl()->
        f2fs_getxattr()->
          read_all_xattrs() fails.

Also there was a BUG_ON triggered after the fault in
mkdir()->
 f2fs_add_link()->
   init_inode_metadata()->
    remove_inode_page() ->
      f2fs_bug_on(inode->i_blocks != 0 && inode->i_blocks != 1);

But, previous patch wasn't perfect to resolve that bug, so the following bug
report was also submitted.

kernel BUG at fs/f2fs/inode.c:274!
Call Trace:
 [<ffffffff811fde03>] evict+0xa3/0x1a0
 [<ffffffff811fe615>] iput+0xf5/0x180
 [<ffffffffa01c7f63>] f2fs_mkdir+0xf3/0x150 [f2fs]
 [<ffffffff811f2a77>] vfs_mkdir+0xb7/0x160
 [<ffffffff811f36bf>] SyS_mkdir+0x5f/0xc0
 [<ffffffff81680769>] system_call_fastpath+0x16/0x1b

Finally, this patch resolves all the issues like below.

If an error is occurred after make_empty_dir(),
 1. truncate_inode_pages()
   The make_bad_inode() prior to iput() will change i_mode to S_IFREG, which
   means that f2fs will not decrement fi->dirty_dents during f2fs_evict_inode.
   But, by calling it here, we can do that.

 2. truncate_blocks()
   Preallocated dentry pages are trucated here to sync i_blocks.

 3. remove_dirty_dir_inode()
   Remove this directory inode from the list.

Reported-and-Tested-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim 924a2ddbd0 f2fs: fix the potential mismatch between dir's i_size and i_blocks
This is the erroneous scenario.

                             i_size    on-disk i_size    i_blocks
__f2fs_add_link()             4096           4096           2
 get_new_data_page            8192           4096           3
 -ENOSPC = init_inode_metadata
 checkpoint                     -            4096           3
 POR and reboot

__f2fs_add_link()             4096           4096           3
 page = get_new_data_page (page->index = 1 by NEW_ADDR)
 add a dentry to the page successfully

f2fs_rmdir()
 f2fs_empty_dir()             4096           4096           3
 f2fs_unlink() goes, since there is no valid dentry due to i_size = 4096.
 But, still there is one dentry in page->index = 1.

So this patch moves the code to write dir->i_size into on-disk i_size in order
to sync dir's i_size, on-disk i_size, and its i_blocks.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim e8dae60458 f2fs: move a branch for code redability
This patch moves a function in f2fs_delete_entry for code readability.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:41:07 +09:00
Jaegeuk Kim a18ff06340 f2fs: call mark_inode_dirty to flush dirty pages
If a dentry page is updated, we should call mark_inode_dirty to add the inode
into the dirty list, so that its dentry pages are flushed to the disk.
Otherwise, the inode can be evicted without flush.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:40:34 +09:00
Chris Fries 6c311ec6c2 f2fs: clean checkpatch warnings
Fixed a variety of trivial checkpatch warnings.  The only delta should
be some minor formatting on log strings that were split / too long.

Signed-off-by: Chris Fries <cfries@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-20 10:27:12 +09:00
Jaegeuk Kim a8865372a8 f2fs: handle errors correctly during f2fs_reserve_block
The get_dnode_of_data nullifies inode and node page when error is occurred.

There are two cases that passes inode page into get_dnode_of_data().

1. make_empty_dir()
    -> get_new_data_page()
      -> f2fs_reserve_block(ipage)
	-> get_dnode_of_data()

2. f2fs_convert_inline_data()
    -> __f2fs_convert_inline_data()
      -> f2fs_reserve_block(ipage)
	-> get_dnode_of_data()

This patch adds correct error handling codes when get_dnode_of_data() returns
an error.

At first, f2fs_reserve_block() calls f2fs_put_dnode() whenever reserve_new_block
returns an error.
So, the rule of f2fs_reserve_block() is to nullify inode page when there is any
error internally.

Finally, two callers of f2fs_reserve_block() should call f2fs_put_dnode()
appropriately if they got an error since successful f2fs_reserve_block().

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-06 16:42:21 +09:00
Jaegeuk Kim 58bfaf44df f2fs: introduce F2FS_INODE macro to get f2fs_inode
This patch introduces F2FS_INODE that returns struct f2fs_inode * from the inode
page.
By using this macro, we can remove unnecessary casting codes like below.

   struct f2fs_inode *ri = &F2FS_NODE(inode_page)->i;
-> struct f2fs_inode *ri = F2FS_INODE(inode_page);

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26 20:32:48 +09:00
Chao Yu d96b143151 f2fs: check filename length in recover_dentry
In current flow, we will get Null return value of f2fs_find_entry in
recover_dentry when name.len is bigger than F2FS_NAME_LEN, and then we
still add this inode into its dir entry.
To avoid this situation, we must check filename length before we use it.

Another point is that we could remove the code of checking filename length
In f2fs_find_entry, because f2fs_lookup will be called previously to ensure of
validity of filename length.

V2:
 o add WARN_ON() as Jaegeuk Kim suggested.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26 12:50:09 +09:00
Chao Yu deead09009 f2fs: avoid to set wrong pino of inode when rename dir
When we rename a dir to new name which is not exist previous,
we will set pino of parent inode with ino of child inode in f2fs_set_link.
It destroy consistency of pino, it should be fixed.

Thanks for previous work of Shu Tan.

Signed-off-by: Shu Tan <shu.tan@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:42:51 +09:00
Chao Yu 4f4124d0b9 f2fs: update several comments
Update several comments:
1. use f2fs_{un}lock_op install of mutex_{un}lock_op.
2. update comment of get_data_block().
3. update description of node offset.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:26:03 +09:00
Chao Yu cfb271d485 f2fs: add unlikely() macro for compiler optimization
As we know, some of our branch condition will rarely be true. So we could add
'unlikely' to let compiler optimize these code, by this way we could drop
unneeded 'jump' assemble code to improve performance.

change log:
 o add *unlikely* as many as possible across the whole source files at once
   suggested by Jaegeuk Kim.

Suggested-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:06 +09:00
Jaegeuk Kim 5d56b6718a f2fs: add an option to avoid unnecessary BUG_ONs
If you want to remove unnecessary BUG_ONs, you can just turn off F2FS_CHECK_FS
in your kernel config.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-29 15:44:38 +09:00
Jaegeuk Kim 2ed2d5b33c f2fs: fix a deadlock during init_acl procedure
The deadlock is found through the following scenario.

sys_mkdir()
 -> f2fs_add_link()
  -> __f2fs_add_link()
   -> init_inode_metadata()
     : lock_page(inode);
    -> f2fs_init_acl()
     -> f2fs_set_acl()
      -> f2fs_setxattr(..., NULL)
       : This NULL page incurs a deadlock at update_inode_page().

So, likewise f2fs_init_security(), this patch adds a parameter to transfer the
locked inode page to f2fs_setxattr().

Found by Linux File System Verification project (linuxtesting.org).

Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-28 13:39:09 +09:00
Jaegeuk Kim cbd56e7d20 f2fs: fix handling orphan inodes
This patch fixes mishandling of the sbi->n_orphans variable.

If users request lots of f2fs_unlink(), check_orphan_space() could be contended.
In such the case, sbi->n_orphans can be read incorrectly so that f2fs_unlink()
would fall into the wrong state which results in the failure of
add_orphan_inode().

So, let's increment sbi->n_orphans virtually prior to the actual orphan inode
stuffs. After that, let's release sbi->n_orphans by calling release_orphan_inode
or remove_orphan_inode.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30 15:17:03 +09:00
Jaegeuk Kim 1cd14cafc6 f2fs: update file name in the inode block during f2fs_rename
The error is reproducible by:
0. mkfs.f2fs /dev/sdb1 & mount
1. touch test1
2. touch test2
3. mv test1 test2
4. umount
5. dumpt.f2fs -i 4 /dev/sdb1

After this, when we retrieve the inode->i_name of test2 by dump.f2fs, we get
test1 instead of test2.
This is because f2fs didn't update the file name during the f2fs_rename.

So, this patch fixes that.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30 15:17:03 +09:00
Gu Zheng 4559071063 f2fs: introduce help function F2FS_NODE()
Introduce help function F2FS_NODE() to simplify the conversion of node_page to
f2fs_node.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30 15:17:02 +09:00
Jaegeuk Kim 99b072bb38 f2fs: fix readdir incorrectness
In the previous Al Viro's readdir patch set, there occurs a bug when
running
xfstest: 006 as follows.

[Error output]
alpha size = 4, name length = 6, total files = 4096, nproc=1
1023 files created
rm: cannot remove `/mnt/f2fs/permname.15150/a': Directory not empty

[Correct output]
alpha size = 4, name length = 6, total files = 4096, nproc=1
4097 files created

This bug is due to the misupdate of directory position in ctx.
So, this patch fixes this.

[AV: fixed a braino]

CC: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-08 13:35:48 +04:00
Linus Torvalds 3f490f7f99 This patch-set includes the following major enhancement patches.
o remount_fs callback function
  o restore parent inode number to enhance the fsync performance
  o xattr security labels
  o reduce the number of redundant lock/unlock data pages
  o avoid frequent write_inode calls
 
 The other minor bug fixes are as follows.
  o endian conversion bugs
  o various bugs in the roll-forward recovery routine
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0h8YAAoJEEAUqH6CSFDSJNMP/1V+e/PaLa8YsHuw5eWPT3Fe
 RYFXJv7SXadHqgQInjwD7JOF8BC9DlUyknYUiXFUZgvKsgMHz+OTCNl+1hLTbUXt
 e6Rhn7vU/fMu3TEtj+v6g8JbpXdXH8TSbtvh9LpoczRS98GYpZuckP0BtQsVkTL3
 jIq6WD8JRkb2DvpDl7viTT0Eq0T61CSmtOwOIvfhhiVxggvDWcnR1mTM9Tymdi4J
 NKtFkjCsKP0Z/7IZZUJAczGEHjsT+O6JDwp8+KVWuZT4BSuchoX4MYAY5wX527Ne
 rZvkolbbfnAFCC3BtETr0DPOkpxnHmJ6dEveIxjZ9cux12CAFA/Ww1QAL4ygiDkd
 Avn8BBEJnfnuzeOUkE0by+9hdF9LOU3CVSxiDhWJegYB16z+c9pSBD9xvlKhKk5B
 QNsjptB6m0CftAq7vIDsryL60uJ2cSHFxPqfwAHEpNngiU/NohTFSZE0sUMbLUNh
 FI6NrHoT7yld1HcB6cvL1lnUqIENFbNhDSSDcTdlN49IJJap4oqtgCnmMMIwbUCO
 vR2/26k5W7NwG+K6XN2IX13AsayzQahxTv8in/+LV0bfjHo6/1VzzGcqAmXJbDQw
 dLrNAeWaaIJi8J/zJiENMbFKXTj8Wax9jxKsW+4towQuyEt4ADvyt1c5gX3VR42T
 x8+YEargsdBf6FAhtN+H
 =qFcy
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This patch-set includes the following major enhancement patches:
   - remount_fs callback function
   - restore parent inode number to enhance the fsync performance
   - xattr security labels
   - reduce the number of redundant lock/unlock data pages
   - avoid frequent write_inode calls

  The other minor bug fixes are as follows.
   - endian conversion bugs
   - various bugs in the roll-forward recovery routine"

* tag 'for-f2fs-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (56 commits)
  f2fs: fix to recover i_size from roll-forward
  f2fs: remove the unused argument "sbi" of func destroy_fsync_dnodes()
  f2fs: remove reusing any prefree segments
  f2fs: code cleanup and simplify in func {find/add}_gc_inode
  f2fs: optimize the init_dirty_segmap function
  f2fs: fix an endian conversion bug detected by sparse
  f2fs: fix crc endian conversion
  f2fs: add remount_fs callback support
  f2fs: recover wrong pino after checkpoint during fsync
  f2fs: optimize do_write_data_page()
  f2fs: make locate_dirty_segment() as static
  f2fs: remove unnecessary parameter "offset" from __add_sum_entry()
  f2fs: avoid freqeunt write_inode calls
  f2fs: optimise the truncate_data_blocks_range() range
  f2fs: use the F2FS specific flags in f2fs_ioctl()
  f2fs: sync dir->i_size with its block allocation
  f2fs: fix i_blocks translation on various types of files
  f2fs: set sb->s_fs_info before calling parse_options()
  f2fs: support xattr security labels
  f2fs: fix iget/iput of dir during recovery
  ...
2013-07-02 09:42:38 -07:00
Al Viro 6f7f231e7b [readdir] convert f2fs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:46 +04:00
Jaegeuk Kim 354a3399dc f2fs: recover wrong pino after checkpoint during fsync
If a file is linked, f2fs loose its parent inode number so that fsync calls
for the linked file should do checkpoint all the time.
But, if we can recover its parent inode number after the checkpoint, we can
adjust roll-forward mechanism for the further fsync calls, which is able to
improve the fsync performance significatly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14 09:04:45 +09:00
Jaegeuk Kim 699489bbbe f2fs: sync dir->i_size with its block allocation
If new dentry block is allocated and its i_size is updated, we should update
its inode block together in order to sync i_size and its block allocation.
Otherwise, we can loose additional dentry block due to the unconsistent i_size.

Errorneous Scenario
-------------------

In the recovery routine,
 - recovery_dentry
 | - __f2fs_add_link
 | | - get_new_data_page
 | | | - i_size_write(new_i_size)
 | | | - mark_inode_dirty_sync(dir)
 | | - update_parent_metadata
 | | | - mark_inode_dirty(dir)
 |
 - write_checkpoint
   - sync_dirty_dir_inodes
     - filemap_flush(dentry_blocks)
       - f2fs_write_data_page
         - skip to write the last dentry block due to index < i_size

In the above flow, new_i_size is not updated to its inode block so that the
last dentry block will be lost accordingly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-12 08:18:35 +09:00
Jaegeuk Kim 8ae8f1627f f2fs: support xattr security labels
This patch adds the support of security labels for f2fs, which will be used
by Linus Security Models (LSMs).

Quote from http://en.wikipedia.org/wiki/Linux_Security_Modules:
"Linux Security Modules (LSM) is a framework that allows the Linux kernel to
support a variety of computer security models while avoiding favoritism toward
any single security implementation. The framework is licensed under the terms of
the GNU General Public License and is standard part of the Linux kernel since
Linux 2.6. AppArmor, SELinux, Smack and TOMOYO Linux are the currently accepted
modules in the official kernel.".

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-11 16:01:03 +09:00
Jaegeuk Kim 83d5d6f66b f2fs: cover cp_file information with ilock
If a file is linked with other files, it should be checkpointed at every fsync
calls.
For this, we use set_cp_file() with FADVISE_CP_BIT, but previously we didn't
cover the flag by the global lock.
This patch fixes that the inode page stores this correctly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:06 +09:00
Namjae Jeon 4777f86b7c f2fs: remove unneeded initializations in f2fs_parent_dir
There is no need to initialize few pointers in f2fs_parent_dir
as the values are not checked and instead directly initialized
values are used.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:05 +09:00
Jaegeuk Kim 44a83ff6a8 f2fs: update inode page after creation
I found a bug when testing power-off-recovery as follows.

[Bug Scenario]
1. create a file
2. fsync the file
3. reboot w/o any sync
4. try to recover the file
 - found its fsync mark
 - found its dentry mark
   : try to recover its dentry
    - get its file name
    - get its parent inode number
     : here we got zero value

The reason why we get the wrong parent inode number is that we didn't
synchronize the inode page with its newly created inode information perfectly.

Especially, previous f2fs stores fi->i_pino and writes it to the cached
node page in a wrong order, which incurs the zero-valued i_pino during the
recovery.

So, this patch modifies the creation flow to fix the synchronization order of
inode page with its inode.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:02 +09:00
Jaegeuk Kim 64aa7ed98d f2fs: change get_new_data_page to pass a locked node page
This patch is for passing a locked node page to get_dnode_of_data.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:01 +09:00
Linus Torvalds 942d33da99 f2fs updates for v3.10
This patch-set includes the following major enhancement patches.
 o introduce a new gloabl lock scheme
 o add tracepoints on several major functions
 o fix the overall cleaning process focused on victim selection
 o apply the block plugging to merge IOs as much as possible
 o enhance management of free nids and its list
 o enhance the readahead mode for node pages
 o address several cretical deadlock conditions
 o reduce lock_page calls
 
 The other minor bug fixes and enhancements are as follows.
 o calculation mistakes: overflow
 o bio types: READ, READA, and READ_SYNC
 o fix the recovery flow, data races, and null pointer errors
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRijCLAAoJEEAUqH6CSFDSg9kQAIqxmQzCUvCN3HcyVe8bGhKz
 8xhKrAY6ySRCKMuBbFRQsNrXUhckE3A44DgzYm5/gQikr/c8zhbqPVrtZ968eCKb
 wm3J+Re/uwZr5eOXlJEaHIiSkMDtERN7Cu2oYJWZi2B9wCSZcgvoWQ3c3LUVk6yF
 GFdi1Y00ll5tFKbEGbXSsfdul9P8jp0MmuMnWBBQZF3TrjETXMdThA5FXN0yTf9s
 XkcGE9vTCCPk8p7P3YmGGw6CwlaL8oallm0//iL4nMNpJzveq2C09IlY2BNrxU3L
 iTNXeIBdbhwXpnh2zq26Cy+cIEDIp0oXYui5BYdr/LWyWU3T/INa+hjUUszsESxF
 51LIUA1rA9nX/BSmj2QomswZ3lt4u5jl6rSBFKv3NG1KsFrAdb8S4tHukRSTSxAJ
 gzpY6kLT1+bgciA16F5W4yhzMYPN5hPa8s6hx4LHlpoqQICQsurjtS9KW7vncLFt
 ttmCMn8ehHcTzKRNNqYaBerCtSB3Z3G/uAy1y+DB7Zx2h2mqhCBXRalyRvs7RKvK
 d5OyYCpHntxuzDwVuivnr9Ddp30LUP1WqexxK+ykn99Ji3leMmffHP8Oari8w96b
 RxSbjoo8hOgoS5xZ4v3AaqtLDlBpxC6oWJzDaq/fJeKxOx22Z5BDFUM9mBGxrouJ
 AATl8b+cW/aTZ4l7WOPU
 =Hqii
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This patch-set includes the following major enhancement patches.
   - introduce a new gloabl lock scheme
   - add tracepoints on several major functions
   - fix the overall cleaning process focused on victim selection
   - apply the block plugging to merge IOs as much as possible
   - enhance management of free nids and its list
   - enhance the readahead mode for node pages
   - address several cretical deadlock conditions
   - reduce lock_page calls

  The other minor bug fixes and enhancements are as follows.
   - calculation mistakes: overflow
   - bio types: READ, READA, and READ_SYNC
   - fix the recovery flow, data races, and null pointer errors"

* tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
  f2fs: cover free_nid management with spin_lock
  f2fs: optimize scan_nat_page()
  f2fs: code cleanup for scan_nat_page() and build_free_nids()
  f2fs: bugfix for alloc_nid_failed()
  f2fs: recover when journal contains deleted files
  f2fs: continue to mount after failing recovery
  f2fs: avoid deadlock during evict after f2fs_gc
  f2fs: modify the number of issued pages to merge IOs
  f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as debug entry.
  f2fs: fix inconsistent using of NM_WOUT_THRESHOLD
  f2fs: check truncation of mapping after lock_page
  f2fs: enhance alloc_nid and build_free_nids flows
  f2fs: add a tracepoint on f2fs_new_inode
  f2fs: check nid == 0 in add_free_nid
  f2fs: add REQ_META about metadata requests for submit
  f2fs: give a chance to merge IOs by IO scheduler
  f2fs: avoid frequent background GC
  f2fs: add tracepoints to debug checkpoint request
  f2fs: add tracepoints for write page operations
  f2fs: add tracepoints to debug the block allocation
  ...
2013-05-08 15:11:48 -07:00
Jaegeuk Kim c718379b6b f2fs: give a chance to merge IOs by IO scheduler
Previously, background GC submits many 4KB read requests to load victim blocks
and/or its (i)node blocks.

...
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
...

However, by the fact that many IOs are sequential, we can give a chance to merge
the IOs by IO scheduler.
In order to do that, let's use blk_plug.

...
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
<idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
<idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
<idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
<idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
<idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
<idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
<idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
...

Note that this issue should be addressed in checkpoint, and some readahead
flows too.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-26 10:35:10 +09:00
Al Viro 0ecc833bac mode_t, whack-a-mole at 11...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:13:05 -04:00
Jaegeuk Kim 399368372e f2fs: introduce a new global lock scheme
In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.

Reference the following lock types in f2fs.h.
enum lock_type {
	RENAME,		/* for renaming operations */
	DENTRY_OPS,	/* for directory operations */
	DATA_WRITE,	/* for data write */
	DATA_NEW,	/* for data allocation */
	DATA_TRUNC,	/* for data truncate */
	NODE_NEW,	/* for node allocation */
	NODE_TRUNC,	/* for node truncate */
	NODE_WRITE,	/* for node write */
	NR_LOCK_TYPE,
};

In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.

In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.

For this, I propose a new global lock scheme as follows.

0. Data structure
 - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
 - f2fs_sb_info -> node_write

1. mutex_lock_op(sbi)
 - try to get an avaiable lock from the array.
 - returns the index of the gottern lock variable.

2. mutex_unlock_op(sbi, index of the lock)
 - unlock the given index of the lock.

3. mutex_lock_all(sbi)
 - grab all the locks in the array before the checkpoint.

4. mutex_unlock_all(sbi)
 - release all the locks in the array after checkpoint.

5. block_operations()
 - call mutex_lock_all()
 - sync_dirty_dir_inodes()
 - grab node_write
 - sync_node_pages()

Note that,
 the pairs of mutex_lock_op()/mutex_unlock_op() and
 mutex_lock_all()/mutex_unlock_all() should be used together.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-09 18:21:18 +09:00
Jaegeuk Kim 5a20d339c7 f2fs: align f2fs maximum name length to linux based filesystem
The maximum filename length supported in linux is 255 characters.
So let's follow that.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-18 21:00:35 +09:00
Linus Torvalds d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Al Viro 496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Jaegeuk Kim 90b2fc64f0 Merge branch 'f2fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into dev
Pull f2fs cleanup patches from Al Viro:

f2fs: get rid of fake on-stack dentries
f2fs: switch init_inode_metadata() to passing parent and name separately
f2fs: switch new_inode_page() from dentry to qstr
f2fs: init_dent_inode() should take qstr

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

Conflicts:
	fs/f2fs/recovery.c
2013-02-12 07:17:20 +09:00
Al Viro b7f7a5e0be f2fs: get rid of fake on-stack dentries
those should never be used for a lot of reasons...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-08 02:55:04 -05:00
Al Viro 69f24eac55 f2fs: switch init_inode_metadata() to passing parent and name separately
... sure, it's tempting to just pass dentry.  Except that we don't
_have_ anything resembling a real dentry on one of the paths to it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-08 02:55:04 -05:00