Commit Graph

31800 Commits

Author SHA1 Message Date
Namjae Jeon b2b3460a94 f2fs: reorganise the function get_victim_by_default
Fix the function get_victim_by_default, where it checks
for the condition  that p.min_segno != NULL_SEGNO as
shown:

if (p.min_segno != NULL_SEGNO)
           goto got_it;

and if above condition is true then

got_it:
        if (p.min_segno != NULL_SEGNO) {

So this condition is being checked twice. Hence move the goto
statement after the if condition so that duplication of condition
check is avoided.

Also this function makes a call to get_max_cost() to compute
the max cost based on the f2fs_sbi_info and victim policy. Since
get_max_cost depends on on three parameters of victim_sel_policy
=> alloc_mode, gc_mode & ofs_unit, once this victim policy is
initialised, these value will not change till the execution
time of get_victim_by_default() & also f2fs_sbi_info structure
parameters will not change.

Hence making calls to get_max_cost() in while loop does not seems to
be a good point. Instead we can call it once in begining and store
the results in local variable, which later can serve our purpose
for comparing the cost with max cost inside the while loop.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-06 14:20:46 +09:00
Jason Hrycay 1e03e38b35 f2fs: handle errors from get_node_page calls
Add check for error pointers returned from get_node_page in order to
avoid dereferencing a bad address on the next use.

Signed-off-by: Jason Hrycay <jason.hrycay@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-03 19:49:09 +09:00
Jaegeuk Kim 83d5d6f66b f2fs: cover cp_file information with ilock
If a file is linked with other files, it should be checkpointed at every fsync
calls.
For this, we use set_cp_file() with FADVISE_CP_BIT, but previously we didn't
cover the flag by the global lock.
This patch fixes that the inode page stores this correctly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:06 +09:00
Jaegeuk Kim afc3eda2a8 f2fs: fix incorrect iputs during the dentry recovery
- iget/iput flow in the dentry recovery process

1. *dir* = f2fs_iget
2. set FI_DELAY_IPUT to *dir*
3. add *dir* to the dirty_dir_list
		   - __f2fs_add_link
		     - recover_dentry)
4. iput *dir* by remove_dirty_dir_inode
		   - sync_dirty_dir_inodes
		     - write_chekcpoint

If *dir*'s i_count is not 1 (i.e., root dir), remove_dirty_dir_inode is called
later and then iput is triggered again due to the FI_DELAY_IPUT flag.
So, let's unset the flag properly once iput is triggered.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:06 +09:00
Jaegeuk Kim 6b8213d9a4 f2fs: fix dentry recovery routine
The error scenario is:
1. create /a
(1.a link /a /b)
2. sync
3. unlinke /a
4. create /a
5. fsync /a
6. Sudden power-off

When the f2fs recovers the fsynced dentry, /a, we discover an exsiting dentry at
f2fs_find_entry() in recover_dentry().

In such the case, we should unlink the existing dentry and its inode
and then recover newly created dentry.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:05 +09:00
Jaegeuk Kim 3b10b1fd2b f2fs: iput only if whole data blocks are flushed
If there remains some unwritten blocks from the recovery, we should not call
iput on that directory inode.
Otherwise, we can loose some dentry blocks after the recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:05 +09:00
Namjae Jeon 7a267f8d74 f2fs: return proper error from start_gc_thread
when there is an error from kthread_run, then return proper error
rather than returning -ENOMEM.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:05 +09:00
Namjae Jeon a06a241603 f2fs: optimize several routines in node.h
There are various functions with common code which could be separated
out to make common routines. So, made new routines and in order to
retain the same call path and no major changes, written some macros
to access those routines.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:05 +09:00
Namjae Jeon 4777f86b7c f2fs: remove unneeded initializations in f2fs_parent_dir
There is no need to initialize few pointers in f2fs_parent_dir
as the values are not checked and instead directly initialized
values are used.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:05 +09:00
Namjae Jeon 35b09d82c3 f2fs: push some variables to debug part
Some, counters are needed only for the statistical information
while debugging.
So, those can be controlled using CONFIG_F2FS_STAT_FS,
pushing the usage for few variables under this flag.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:05 +09:00
Jaegeuk Kim a9841c4dbb f2fs: align data types between on-disk and in-memory block addresses
The on-disk block address is defined as __le32, but in-memory block address,
block_t, does as u64.

Let's synchronize them to 32 bits.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:04 +09:00
Dan Carpenter f28c06fa6f f2fs: dereferencing an ERR_PTR
There is an error path where "dir" is an ERR_PTR.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:04 +09:00
Jaegeuk Kim 6f6fd833e1 f2fs: use ihold
Use the following helper function committed by Al.

commit 7de9c6ee3e
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sat Oct 23 11:11:40 2010 -0400

    new helper: ihold()

...

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:04 +09:00
Jaegeuk Kim 93ff10d690 f2fs: should not make_bad_inode on f2fs_link failure
If -ENOSPC is met during f2fs_link, we should not make the inode as bad.
The inode is still alive.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:04 +09:00
Jaegeuk Kim 39cf72cf09 f2fs: fix to handle do_recover_data errors
This patch adds error handling codes of check_index_in_prev_nodes and its
caller, do_recover_data.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:04 +09:00
Jaegeuk Kim b292dcab06 f2fs: reuse the locked dnode page and its inode
This patch fixes the following deadlock bug during the recovery.

INFO: task mount:1322 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount           D ffffffff81125870     0  1322   1266 0x00000000
 ffff8801207e39d8 0000000000000046 ffff88012ab1dee0 0000000000000046
 ffff8801207e3a08 ffff880115903f40 ffff8801207e3fd8 ffff8801207e3fd8
 ffff8801207e3fd8 ffff880115903f40 ffff8801207e39d8 ffff88012fc94520
Call Trace:
[<ffffffff81125870>] ? __lock_page+0x70/0x70
[<ffffffff816a92d9>] schedule+0x29/0x70
[<ffffffff816a93af>] io_schedule+0x8f/0xd0
[<ffffffff8112587e>] sleep_on_page+0xe/0x20
[<ffffffff816a649a>] __wait_on_bit_lock+0x5a/0xc0
[<ffffffff81125867>] __lock_page+0x67/0x70
[<ffffffff8106c7b0>] ? autoremove_wake_function+0x40/0x40
[<ffffffff81126857>] find_lock_page+0x67/0x80
[<ffffffff8112698f>] find_or_create_page+0x3f/0xb0
[<ffffffffa03901a8>] ? sync_inode_page+0xa8/0xd0 [f2fs]
[<ffffffffa038fdf7>] get_node_page+0x67/0x180 [f2fs]
[<ffffffffa039818b>] recover_fsync_data+0xacb/0xff0 [f2fs]
[<ffffffff816aaa1e>] ? _raw_spin_unlock+0x3e/0x40
[<ffffffffa0389634>] f2fs_fill_super+0x7d4/0x850 [f2fs]
[<ffffffff81184cf9>] mount_bdev+0x1c9/0x210
[<ffffffffa0388e60>] ? validate_superblock+0x180/0x180 [f2fs]
[<ffffffffa0387635>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffff81185a13>] mount_fs+0x43/0x1b0
[<ffffffff81145ba0>] ? __alloc_percpu+0x10/0x20
[<ffffffff811a0796>] vfs_kern_mount+0x76/0x120
[<ffffffff811a2cb7>] do_mount+0x237/0xa10
[<ffffffff81140b9b>] ? strndup_user+0x5b/0x80
[<ffffffff811a3520>] SyS_mount+0x90/0xe0
[<ffffffff816b3502>] system_call_fastpath+0x16/0x1b

The bug is triggered when check_index_in_prev_nodes tries to get the direct
node page by calling get_node_page.
At this point, if the direct node page is already locked by get_dnode_of_data,
its caller, we got a deadlock condition.

This patch adds additional condition check for the reuse of locked direct node
pages prior to the get_node_page call.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:04 +09:00
Jaegeuk Kim b638f0c4b8 f2fs: fix wrong condition check
While an orphan inode has zero link_count, f2fs_gc is able to select the inode
for foreground gc.

- f2fs_gc
 - do_garbage_collect
   - gc_data_segment
     : f2fs_iget is failed
     : get_valid_blocks() != 0, so that retry
--> here we got the infinite loop.

This patch resolved this issue.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:03 +09:00
Jaegeuk Kim 77888c1e42 f2fs: add f2fs_readonly()
Introduce a simple macro function for readability.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:03 +09:00
Jaegeuk Kim 6f85b35203 f2fs: avoid RECLAIM_FS-ON-W: deadlock
This patch tries to avoid the following deadlock condition of which the reclaim
path can trigger f2fs_balance_fs again.

=================================
[ INFO: inconsistent lock state ]
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
kswapd0/41 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (&sbi->gc_mutex){+.+.?.}, at: f2fs_balance_fs+0xe6/0x100 [f2fs]
{RECLAIM_FS-ON-W} state was registered at:
  [<ffffffff810aa5a9>] mark_held_locks+0xb9/0x140
  [<ffffffff810aae85>] lockdep_trace_alloc+0x85/0xf0
  [<ffffffff8113ab2c>] __alloc_pages_nodemask+0x7c/0x9b0
  [<ffffffff81175aa8>] alloc_pages_current+0xb8/0x180
  [<ffffffff811319cf>] __page_cache_alloc+0xaf/0xd0
  [<ffffffff8113225c>] find_or_create_page+0x4c/0xb0
  [<ffffffffa021359e>] find_data_page+0x14e/0x210 [f2fs]
  [<ffffffffa021161b>] f2fs_gc+0x9eb/0xd90 [f2fs]
  [<ffffffffa0218fae>] f2fs_balance_fs+0xee/0x100 [f2fs]
  [<ffffffffa020848c>] f2fs_setattr+0x6c/0x200 [f2fs]
  [<ffffffff811ae51b>] notify_change+0x1db/0x3a0
  [<ffffffff8118fbd0>] do_truncate+0x60/0xa0
  [<ffffffff8118fd95>] vfs_truncate+0x185/0x1b0
  [<ffffffff8118fe1c>] do_sys_truncate+0x5c/0xa0
  [<ffffffff8118ffee>] SyS_truncate+0xe/0x10
  [<ffffffff816e2b42>] system_call_fastpath+0x16/0x1b

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:03 +09:00
Jaegeuk Kim 2c2c149f7d f2fs: don't do checkpoint if error is occurred
If we met an error during the dentry recovery, we should not conduct checkpoint.
Otherwise, some errorneous dentry blocks overwrites the existing blocks that
contain the remaining recovery information.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:03 +09:00
Jaegeuk Kim 45856aff0d f2fs: fix to unlock page before exit
If we got an error after lock_page, we should unlock it before exit.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:03 +09:00
Jaegeuk Kim 9a55ed656c f2fs: remove unnecessary kmap/kunmap operations
The allocated page used by the recovery is not on HIGHMEM, so that we don't
need to use kmap/kunmap.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:03 +09:00
Namjae Jeon 9851e6e189 f2fs: reorganize f2fs_vm_page_mkwrite
Few things can be changed in the default mkwrite function
1) Make file_update_time at the start before acquiring any lock
2) the condition page_offset(page) >= i_size_read(inode) should be
 changed to page_offset(page) > i_size_read
3) Move wait_on_page_writeback.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:02 +09:00
majianpeng 145b04e5ed f2fs: use list_for_each_entry rather than list_for_each_entry_safe
We can do this, since now we use a global mutex, f2fs_stat_mutex to protect its
list operations.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
[Jaegeuk Kim: add description]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:02 +09:00
Haicheng Li 81fb5e8746 f2fs: remove unecessary variable and code
Code cleanup without behavior changed.

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:02 +09:00
Peter Zijlstra bfe35965ec f2fs, lockdep: annotate mutex_lock_all()
Majianpeng reported a lockdep splat for f2fs. It turns out mutex_lock_all()
acquires an array of locks (in global/local lock style).

Any such operation is always serialized using cp_mutex, therefore there is no
fs_lock[] lock-order issue; tell lockdep about this using the
mutex_lock_nest_lock() primitive.

Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:02 +09:00
Jaegeuk Kim f356fe0cba f2fs: add debug msgs in the recovery routine
This patch adds some trivial debugging messages in the recovery process.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:02 +09:00
Jaegeuk Kim 44a83ff6a8 f2fs: update inode page after creation
I found a bug when testing power-off-recovery as follows.

[Bug Scenario]
1. create a file
2. fsync the file
3. reboot w/o any sync
4. try to recover the file
 - found its fsync mark
 - found its dentry mark
   : try to recover its dentry
    - get its file name
    - get its parent inode number
     : here we got zero value

The reason why we get the wrong parent inode number is that we didn't
synchronize the inode page with its newly created inode information perfectly.

Especially, previous f2fs stores fi->i_pino and writes it to the cached
node page in a wrong order, which incurs the zero-valued i_pino during the
recovery.

So, this patch modifies the creation flow to fix the synchronization order of
inode page with its inode.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:02 +09:00
Jaegeuk Kim 64aa7ed98d f2fs: change get_new_data_page to pass a locked node page
This patch is for passing a locked node page to get_dnode_of_data.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:01 +09:00
Jaegeuk Kim 1646cfac95 f2fs: skip get_node_page if locked node page is passed
If get_dnode_of_data gets a locked node page, let's skip redundant
get_node_page calls.
This is for the futher enhancement.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:01 +09:00
Jaegeuk Kim 0a364af18f f2fs: remove unnecessary por_doing check
This por_doing check is totally not related to the recovery process.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:01 +09:00
Jaegeuk Kim 74d0b917ef f2fs: fix BUG_ON during f2fs_evict_inode(dir)
During the dentry recovery routine, recover_inode() triggers __f2fs_add_link
with its directory inode.

In the following scenario, a bug is captured.
 1. dir = f2fs_iget(pino)
 2. __f2fs_add_link(dir, name)
 3. iput(dir)
  -> f2fs_evict_inode() faces with BUG_ON(atomic_read(fi->dirty_dents))

Kernel BUG at ffffffffa01c0676 [verbose debug info unavailable]
[<ffffffffa01c0676>] f2fs_evict_inode+0x276/0x300 [f2fs]
Call Trace:
 [<ffffffff8118ea00>] evict+0xb0/0x1b0
 [<ffffffff8118f1c5>] iput+0x105/0x190
 [<ffffffffa01d2dac>] recover_fsync_data+0x3bc/0x1070 [f2fs]
 [<ffffffff81692e8a>] ? io_schedule+0xaa/0xd0
 [<ffffffff81690acb>] ? __wait_on_bit_lock+0x7b/0xc0
 [<ffffffff8111a0e7>] ? __lock_page+0x67/0x70
 [<ffffffff81165e21>] ? kmem_cache_alloc+0x31/0x140
 [<ffffffff8118a502>] ? __d_instantiate+0x92/0xf0
 [<ffffffff812a949b>] ? security_d_instantiate+0x1b/0x30
 [<ffffffff8118a5b4>] ? d_instantiate+0x54/0x70

This means that we should flush all the dentry pages between iget and iput().
But, during the recovery routine, it is unallowed due to consistency, so we
have to wait the whole recovery process.
And then, write_checkpoint flushes all the dirty dentry blocks, and nicely we
can put the stale dir inodes from the dirty_dir_inode_list.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:01 +09:00
Jaegeuk Kim 8c26d7d571 f2fs: fix por_doing variable coverage
The reason of using sbi->por_doing is to alleviate data writes during the
recovery.
The find_fsync_dnodes() produces some dirty dentry pages, so we should
cover it too with sbi->por_doing.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:01 +09:00
Jaegeuk Kim addbe45b00 f2fs: remove redundant assignment
We don't need to assign a value redundantly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:01 +09:00
Jaegeuk Kim 650495dedc f2fs: fix the inconsistent state of data pages
In get_lock_data_page, if there is a data race between get_dnode_of_data for
node and grab_cache_page for data, f2fs is able to face with the following
BUG_ON(dn.data_blkaddr == NEW_ADDR).

kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/data.c:251!
 [<ffffffffa044966c>] get_lock_data_page+0x1ec/0x210 [f2fs]
Call Trace:
 [<ffffffffa043b089>] f2fs_readdir+0x89/0x210 [f2fs]
 [<ffffffff811a0920>] ? fillonedir+0x100/0x100
 [<ffffffff811a0920>] ? fillonedir+0x100/0x100
 [<ffffffff811a07f8>] vfs_readdir+0xb8/0xe0
 [<ffffffff811a0b4f>] sys_getdents+0x8f/0x110
 [<ffffffff816d7999>] system_call_fastpath+0x16/0x1b

This bug is able to be occurred when the block address of the data block is
changed after f2fs_put_dnode().
In order to avoid that, this patch fixes the lock order of node and data
blocks in which the node block lock is covered by the data block lock.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:00 +09:00
Jaegeuk Kim 65e5cd0a15 f2fs: fix inconsistency of block count during recovery
Currently f2fs recovers the dentry of fsynced files.
When power-off-recovery is conducted, this newly recovered inode should increase
node block count as well as inode block count.

This patch resolves this inconsistency that results in:

1. create a file
2. write data
3. fsync
4. reboot without sync
5. mount and recover the file
6. node block count is 1 and inode block count is 2
 : fall into the inconsistent state
7. unlink the file
 : trigger the following BUG_ON

------------[ cut here ]------------
kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/f2fs.h:716!
Call Trace:
 [<ffffffffa0344100>] ? get_node_page+0x50/0x1a0 [f2fs]
 [<ffffffffa0344bfc>] remove_inode_page+0x8c/0x100 [f2fs]
 [<ffffffffa03380f0>] ? f2fs_evict_inode+0x180/0x2d0 [f2fs]
 [<ffffffffa033812e>] f2fs_evict_inode+0x1be/0x2d0 [f2fs]
 [<ffffffff811c7a67>] evict+0xa7/0x1a0
 [<ffffffff811c82b5>] iput+0x105/0x190
 [<ffffffff811c2b30>] d_kill+0xe0/0x120
 [<ffffffff811c2c57>] dput+0xe7/0x1e0
 [<ffffffff811acc3d>] __fput+0x19d/0x2d0
 [<ffffffff811acd7e>] ____fput+0xe/0x10
 [<ffffffff81070645>] task_work_run+0xb5/0xe0
 [<ffffffff81002941>] do_notify_resume+0x71/0xb0
 [<ffffffff8175f14a>] int_signal+0x12/0x17

Reported-and-Tested-by: Chris Fries <C.Fries@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:00 +09:00
Linus Torvalds 89ff77837a NFS client bugfixes for 3.10
- Stable fix to prevent an rpc_task wakeup race
 - Fix a NFSv4.1 session drain deadlock
 - Fix a NFSv4/v4.1 mount regression when not running rpc.gssd
 - Ensure auth_gss pipe detection works in namespaces
 - Fix SETCLIENTID fallback if rpcsec_gss is not available
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRomGNAAoJEGcL54qWCgDyzUIQALj4hsOtdcmCCAWu0m+Pr+Gz
 /6rpO2xJ5cLWyTYS2J9wmPkU+mb/g/OB/OhANyQQsDbfoW3e27TGaXRB8hUs29vk
 LHTkFcrBTi4ohlw2Gyb6LWDF6cyHkC7cpH7tfIjLDff77V/qK1uqo41MtYpqUvy3
 41tMoFOhvpwsy7HuKqVyYtNlwxnXrdJ5XvK0ycaQYQ9t8om9Hxu/scCgLfl/qwRr
 eVhIesC2/Y1eC46UK7/TWCC7aTLkvi85UQY+fywMkajt/gPzjjh6nuhc45aOT9d7
 pc0sXgIs8fLEjC+CRa0ojrziJZCHZY93U1kzA+CotRqPq76nPeFeG/1VhViKKb4C
 1AU9IA4ntViUqNDQiGrCxE6ZeMewTOKksrCErwSS16Hv9Gqh4SHq84R/bF6MFgBc
 dqQsexUbf/vaWHmuosARgbyc/QcBXDwWwbkq0hXSWbHA8vpc8/Lw1wkp49eyKz0V
 0zwJ9xInakXr5/DIC72IKEF0Dg26L+GTXWLmDZVcdpVBp5A403trxUKckcsNa/Xf
 FXaGMhyv2ZfV4AohW1Z8klkLiMHt4dr+YB7SQAgR/gb81p3odgKgMz51PfmHxFqs
 4nxwRQqWplBTGiLrOFSgaRC50jLqEloUweWC2qf56KYEb9N6M6XDIXhllLIMyWLD
 94cvihpKj+/ecKH5kQWF
 =B8go
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:

 - Stable fix to prevent an rpc_task wakeup race
 - Fix a NFSv4.1 session drain deadlock
 - Fix a NFSv4/v4.1 mount regression when not running rpc.gssd
 - Ensure auth_gss pipe detection works in namespaces
 - Fix SETCLIENTID fallback if rpcsec_gss is not available

* tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix SETCLIENTID fallback if GSS is not available
  SUNRPC: Prevent an rpc_task wakeup race
  NFSv4.1 Fix a pNFS session draining deadlock
  SUNRPC: Convert auth_gss pipe detection to work in namespaces
  SUNRPC: Faster detection if gssd is actually running
  SUNRPC: Fix a bug in gss_create_upcall
2013-05-26 12:33:05 -07:00
Linus Torvalds 088d812fe9 xfs: fixes for 3.10-rc3
- Fix for corruption with FSX on 512 byte blocksize filesystems
 - Fix rounding error in xfs_free_file_space
 - Fix use-after-free with extent free intents
 - Add several missing KM_NOFS flags to fix lockdep reports
 - Several fixes for CRC related code
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJRn93mAAoJENaLyazVq6ZOPw8P/1ab7Gr4K7ccpuRvC+JiBPx9
 /Eb1l2A68r8rWJq3bEvLM/XgN7F0IESastS+iA8lrNuLWJAHCb+Y37mdbyWymhWO
 J6LFg/Mic+n5uGmg6nXrqLJ8epTd3L+V+cCEHLpa/20zEwa5yd8sSTV0P1pfcMRi
 S36Uof7boe+s4H/s2z4YX8Y+dkaVn1tN7j4PDs5Zd/wiUq0EKA5le4AkbqIJlCDn
 E9mJL1S2t3QwfgQWL+pe8f5jzReGTAxJUQWX2ErajX4vYI7PnR1WTYf7YJB291f0
 XPfkjqvaK0WCdipSkgb58t/Rj2IxoHItdryvBHa/rnL4e95aBQMpNyY8btNX28ou
 NgcIQnqQB+xlJeHpSyb1xNusamGzWw0CEYsXPcDQbXj1uJjd7/GfqTwtVbB4zOlW
 Hua7/6N22vaY1dfQcQFwgi1XcKF9jSVKBVOvy7yHEmdNuT7mFejaG1gVO7NpTIgd
 s7R91pQVlYpLTmZHK6ZqZap5OC6kS/YJr9HxVxS3FQhaJmFwGqvf6UUjSWOK1MnJ
 obTbbygo2Orvh10lgzDNsd+xRJZ9aFn+UfywSeQGTfm91HvTnEQje3xqctv3zmBy
 adrWSXXg/eN07nciNk0zTgHnQZo1v/ZeAFlz0AcSslMMQ7Pq4+Pv4LPtdxMIdrxU
 1IZ32GunGTUXqm6GRW6v
 =hQCM
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.10-rc3' of git://oss.sgi.com/xfs/xfs

Pull xfs fixes from Ben Myers:
 "Here are fixes for corruption on 512 byte filesystems, a rounding
  error, a use-after-free, some flags to fix lockdep reports, and
  several fixes related to CRCs.  We have a somewhat larger post -rc1
  queue than usual due to fixes related to the CRC feature we merged for
  3.10:

   - Fix for corruption with FSX on 512 byte blocksize filesystems
   - Fix rounding error in xfs_free_file_space
   - Fix use-after-free with extent free intents
   - Add several missing KM_NOFS flags to fix lockdep reports
   - Several fixes for CRC related code"

* tag 'for-linus-v3.10-rc3' of git://oss.sgi.com/xfs/xfs:
  xfs: remote attribute lookups require the value length
  xfs: xfs_attr_shortform_allfit() does not handle attr3 format.
  xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC
  xfs: fix missing KM_NOFS tags to keep lockdep happy
  xfs: Don't reference the EFI after it is freed
  xfs: fix rounding in xfs_free_file_space
  xfs: fix sub-page blocksize data integrity writes
2013-05-26 09:35:02 -07:00
Benjamin LaHaise 03e04f048d aio: fix kioctx not being freed after cancellation at exit time
The recent changes overhauling fs/aio.c introduced a bug that results in
the kioctx not being freed when outstanding kiocbs are cancelled at
exit_aio() time.  Specifically, a kiocb that is cancelled has its
completion events discarded by batch_complete_aio(), which then fails to
wake up the process stuck in free_ioctx().  Fix this by modifying the
wait_event() condition in free_ioctx() appropriately.

This patch was tested with the cancel operation in the thread based code
posted yesterday.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Zach Brown <zab@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:53 -07:00
Joseph Qi b4ca2b4b57 ocfs2: goto out_unlock if ocfs2_get_clusters_nocache() failed in ocfs2_fiemap()
Last time we found there is lock/unlock bug in ocfs2_file_aio_write, and
then we did a thorough search for all lock resources in
ocfs2_inode_info, including rw, inode and open lockres and found this
bug.  My kernel version is 3.0.13, and it is also in the lastest version
3.9.  In ocfs2_fiemap, once ocfs2_get_clusters_nocache failed, it should
goto out_unlock instead of out, because we need release buffer head, up
read alloc sem and unlock inode.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:52 -07:00
Ryusuke Konishi 136e8770cd nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary
nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary

DESCRIPTION:
 There are use-cases when NILFS2 file system (formatted with block size
lesser than 4 KB) can be remounted in RO mode because of encountering of
"broken bmap" issue.

The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>:
 "The machine I've been trialling nilfs on is running Debian Testing,
  Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
  version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
  also reproduced it (identically) with Debian Unstable amd64 and Debian
  Experimental (using the 3.8-trunk kernel).  The problematic partitions
  were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."

SYMPTOMS:
(1) System log contains error messages likewise:

    [63102.496756] nilfs_direct_assign: invalid pointer: 0
    [63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
    [63102.496798]
    [63102.524403] Remounting filesystem read-only

(2) The NILFS2 file system is remounted in RO mode.

REPRODUSING PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):

----------------[BEGIN SCRIPT]--------------------

VG=unencrypted
lvcreate --size 2G --name ntest $VG
mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
mkdir /var/tmp/n
mkdir /var/tmp/n/ntest
mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
mkdir /var/tmp/n/ntest/thedir
cd /var/tmp/n/ntest/thedir
sleep 2
date
darcs init
sleep 2
dmesg|tail -n 5
date
darcs whatsnew || true
date
sleep 2
dmesg|tail -n 5
----------------[END SCRIPT]--------------------

REPRODUCIBILITY: 100%

INVESTIGATION:
As it was discovered, the issue takes place during segment
construction after executing such sequence of user-space operations:

  open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
  fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  ftruncate(7, 60)

The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
bmap (inode number=28)" takes place because of trying to get block
number for third block of the file with logical offset #3072 bytes.  As
it is possible to see from above output, the file has 60 bytes of the
whole size.  So, it is enough one block (1 KB in size) allocation for
the whole file.  Trying to operate with several blocks instead of one
takes place because of discovering several dirty buffers for this file
in nilfs_segctor_scan_file() method.

The root cause of this issue is in nilfs_set_page_dirty function which
is called just before writing to an mmapped page.

When nilfs_page_mkwrite function handles a page at EOF boundary, it
fills hole blocks only inside EOF through __block_page_mkwrite().

The __block_page_mkwrite() function calls set_page_dirty() after filling
hole blocks, thus nilfs_set_page_dirty function (=
a_ops->set_page_dirty) is called.  However, the current implementation
of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
at EOF boundary.

As a result, buffers outside EOF are inconsistently marked dirty and
queued for write even though they are not mapped with nilfs_get_block
function.

FIX:
This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.

Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
for this issue.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Anthony Doggett <Anthony2486@interfaces.org.uk>
Reported-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:52 -07:00
Jeff Moyer 6900807c6b aio: fix io_getevents documentation
In reviewing man pages, I noticed that io_getevents is documented to
update the timeout that gets passed into the library call.  This doesn't
happen in kernel space or in the library (even though it's documented to
do so in both places).  Unless there is objection, I'd like to fix the
comments/docs to match the code (I will also update the man page upon
consensus).

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: Cyril Hrubis <chrubis@suse.cz>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:52 -07:00
Jeff Mahoney fb09c3733a hfs: avoid crash in hfs_bnode_create
Commit 634725a929 ("hfs: cleanup HFS+ prints") removed the BUG_ON in
hfs_bnode_create in hfsplus.  This patch removes it from the hfs version
and avoids an fsfuzzer crash.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:51 -07:00
Joseph Qi afe1bb73f8 ocfs2: unlock rw lock if inode lock failed
In ocfs2_file_aio_write(), it does ocfs2_rw_lock() first and then
ocfs2_inode_lock().

But if ocfs2_inode_lock() failed, it goes to out_sems without unlocking
rw lock.  This will cause a bug in ocfs2_lock_res_free() when testing
res->l_ex_holders, which is increased in __ocfs2_cluster_lock() and
decreased in __ocfs2_cluster_unlock().

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: "Duyongfeng (B)" <du.duyongfeng@huawei.com>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:51 -07:00
OGAWA Hirofumi 7b92d03c32 fat: fix possible overflow for fat_clusters
Intermediate value of fat_clusters can be overflowed on 32bits arch.

Reported-by: Krzysztof Strasburger <strasbur@chkw386.ch.pwr.wroc.pl>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:50 -07:00
Dave Chinner 7ae077802c xfs: remote attribute lookups require the value length
When reading a remote attribute, to correctly calculate the length
of the data buffer for CRC enable filesystems, we need to know the
length of the attribute data. We get this information when we look
up the attribute, but we don't store it in the args structure along
with the other remote attr information we get from the lookup. Add
this information to the args structure so we can use it
appropriately.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit e461fcb194)
2013-05-24 16:31:20 -05:00
Dave Chinner cf257abf02 xfs: xfs_attr_shortform_allfit() does not handle attr3 format.
xfstests generic/117 fails with:

XFS: Assertion failed: leaf->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC)

indicating a function that does not handle the attr3 format
correctly. Fix it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit b38958d715)
2013-05-24 16:29:56 -05:00
Dave Chinner 7ced60cae4 xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 72916fb8cb)
2013-05-24 16:29:37 -05:00
Dave Chinner b17cb364db xfs: fix missing KM_NOFS tags to keep lockdep happy
There are several places where we use KM_SLEEP allocation contexts
and use the fact that they are called from transaction context to
add KM_NOFS where appropriate. Unfortunately, there are several
places where the code makes this assumption but can be called from
outside transaction context but with filesystem locks held. These
places need explicit KM_NOFS annotations to avoid lockdep
complaining about reclaim contexts.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit ac14876cf9)
2013-05-24 16:29:15 -05:00
Dave Chinner 509e708a89 xfs: Don't reference the EFI after it is freed
Checking the EFI for whether it is being released from recovery
after we've already released the known active reference is a mistake
worthy of a brown paper bag. Fix the (now) obvious use after free
that it can cause.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 52c24ad39f)
2013-05-24 16:27:57 -05:00