Commit Graph

656 Commits

Author SHA1 Message Date
Jens Axboe 4c63f5646e Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core
Conflicts:
	block/blk-core.c
	block/blk-flush.c
	drivers/md/raid1.c
	drivers/md/raid10.c
	drivers/md/raid5.c
	fs/nilfs2/btnode.c
	fs/nilfs2/mdt.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:58:35 +01:00
Jens Axboe 721a9602e6 block: kill off REQ_UNPLUG
With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:27 +01:00
Jens Axboe 7eaceaccab block: remove per-queue plugging
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:07 +01:00
Ryusuke Konishi e3154e9748 nilfs2: get rid of nilfs_sb_info structure
This directly uses sb->s_fs_info to keep a nilfs filesystem object and
fully removes the intermediate nilfs_sb_info structure.  With this
change, the hierarchy of on-memory structures of nilfs will be
simplified as follows:

Before:
  super_block
       -> nilfs_sb_info
             -> the_nilfs
                   -> cptree --+-> nilfs_root (current file system)
                               +-> nilfs_root (snapshot A)
                               +-> nilfs_root (snapshot B)
                               :
             -> nilfs_sc_info (log writer structure)
After:
  super_block
       -> the_nilfs
             -> cptree --+-> nilfs_root (current file system)
                         +-> nilfs_root (snapshot A)
                         +-> nilfs_root (snapshot B)
                         :
             -> nilfs_sc_info (log writer structure)

The reason why we didn't design so from the beginning is because the
initial shape also differed from the above.  The early hierachy was
composed of "per-mount-point" super_block -> nilfs_sb_info pairs and a
shared nilfs object.  On the kernel 2.6.37, it was changed to the
current shape in order to unify super block instances into one per
device, and this cleanup became applicable as the result.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-09 11:54:26 +09:00
Ryusuke Konishi f7545144c2 nilfs2: use sb instance instead of nilfs_sb_info struct
This replaces sbi uses with direct reference to sb instance.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-09 11:05:08 +09:00
Ryusuke Konishi d96bbfa28a nilfs2: get rid of sc_sbi back pointer
Removes sci->sc_sbi which is a back pointer to nilfs_sb_info struct
from log writer object (nilfs_sc_info).

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-09 11:05:08 +09:00
Ryusuke Konishi 3fd3fe5aea nilfs2: move log writer onto nilfs object
Log writer is held by the nilfs_sb_info structure.  This moves it into
nilfs object and replaces all uses of NILFS_SC() accessor.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-09 11:05:08 +09:00
Ryusuke Konishi 9b1fc4e497 nilfs2: move next generation counter into nilfs object
Moves s_next_generation counter and a spinlock protecting it to nilfs
object from nilfs_sb_info structure.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-09 11:05:08 +09:00
Ryusuke Konishi 693dd32122 nilfs2: move s_inode_lock and s_dirty_files into nilfs object
Moves s_inode_lock spinlock and s_dirty_files list to nilfs object
from nilfs_sb_info structure.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-09 11:05:07 +09:00
Ryusuke Konishi 574e6c3145 nilfs2: move parameters on nilfs_sb_info into nilfs object
This moves four parameter variables on nilfs_sb_info s_resuid,
s_resgid, s_interval and s_watermark to the nilfs object.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-09 11:05:07 +09:00
Ryusuke Konishi 3b2ce58b0f nilfs2: move mount options to nilfs object
This moves mount_opt local variable to nilfs object from nilfs_sb_info
struct.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-09 11:05:07 +09:00
Ryusuke Konishi be667377a8 nilfs2: record used amount of each checkpoint in checkpoint list
This records the number of used blocks per checkpoint in each
checkpoint entry of cpfile.  Even though userland tools can get the
block count via nilfs_get_cpinfo ioctl, it was not updated by the
nilfs2 kernel code.  This fixes the issue and makes it available for
userland tools to calculate used amount per checkpoint.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Jiro SEKIBA <jir@unicus.jp>
2011-03-08 14:58:31 +09:00
Ryusuke Konishi ae191838b0 nilfs2: optimize rec_len functions
This is a similar change to those in ext2/ext3 codebase (commit
40a063f669 and a4ae309486, respectively).

The addition of 64k block capability in the rec_len_from_disk and
rec_len_to_disk functions added a bit of math overhead which slows
down file create workloads needlessly when the architecture cannot
even support 64k blocks.  This will cut the corner.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:30 +09:00
Ryusuke Konishi 4138ec2382 nilfs2: append blocksize info to warnings during loading super blocks
At present, the same warning message can be output twice when nilfs
detected a problem on super blocks:

 NILFS warning: broken superblock. using spare superblock.
 NILFS warning: broken superblock. using spare superblock.
 ...

This is because these super blocks are reloaded with the block size
written in a super block if it differs from the first block size, but
this repetition looks somewhat confusing.  So, we hint at what is
going on by appending block size information to those messages.

Reported-by: Wakko Warner <wakko@animx.eu.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:30 +09:00
Ryusuke Konishi 828b1c50ae nilfs2: add compat ioctl
The current FS_IOC_GETFLAGS/SETFLAGS/GETVERSION will fail if
application is 32 bit and kernel is 64 bit.

This issue is avoidable by adding compat_ioctl method.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:30 +09:00
Ryusuke Konishi cde98f0f84 nilfs2: implement FS_IOC_GETFLAGS/SETFLAGS/GETVERSION
Add support for the standard attributes set via chattr and read via
lsattr.  These attributes are already in the flags value in the nilfs2
inode, but currently we don't have any ioctl commands that expose them
to the userland.

Collaterally, this adds the FS_IOC_GETVERSION ioctl for getting
i_generation, which allows users to list the file's generation number
with "lsattr -v".

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:30 +09:00
Ryusuke Konishi b253a3e4f2 nilfs2: tighten restrictions on inode flags
Nilfs has few rectrictions on which flags may be set on which inodes
like ext2/3/4 filesystems used to be.  Specifically DIRSYNC may only
be set on directories and IMMUTABLE and APPEND may not be set on
links.  Tighten that to disallow TOPDIR being set on non-directories
and only NODUMP and NOATIME to be set on non-regular file,
non-directories.

This introduces a flags masking function like those of extN and uses
it during inode creation.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:29 +09:00
Ryusuke Konishi 32f4aeb315 nilfs2: mark S_NOATIME on inodes only if NOATIME attribute is set
At present, nilfs marks S_NOATIME flag on all inodes.  This restricts
nilfs_set_inode_flags function so that it marks S_NOATIME only if a
given inode has an FS_NOATIME_FL flag.

Although nilfs does not support atime yet, touch_atime() still safely
returns on IS_NOATIME check since MS_NOATIME is always set on sb.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:29 +09:00
Ryusuke Konishi f0c9f242f9 nilfs2: use common file attribute macros
Replaces uses of own inode flags (i.e. NILFS_SECRM_FL, NILFS_UNRM_FL,
NILFS_COMPR_FL, and so forth) with common inode flags, and removes the
own flag declarations.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:29 +09:00
Ryusuke Konishi 9954e7af14 nilfs2: add free entries count only if clear bit operation succeeded
Three functions of the current persistent object allocator,
nilfs_palloc_commit_free_entry, nilfs_palloc_abort_alloc_entry, and
nilfs_palloc_freev functions unconditionally add a counter after doing
clear bit operation on a bitmap block.

If the clear bit operation overlapped, the counter will not add up.
This fixes the issue by making the counter operations conditional.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:04 +09:00
Ryusuke Konishi 25b18d39cc nilfs2: decrement inodes count only if raw inode was successfully deleted
This fixes the issue that inodes count will not add up after removal
of raw inodes fails.  Hence, this prevents possible under flow of the
inodes count.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-08 14:58:04 +09:00
Linus Torvalds 8336026942 Merge branch 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  hfs: fix rename() over non-empty directory
  udf: fix i_nlink limit
  fix reiserfs mkdir() breakage
  exofs: i_nlink races in rename()
  nilfs2: i_nlink races in rename()
  minix: i_nlink races in rename()
  ufs: i_nlink races in rename()
  sysv: i_nlink races in rename()
2011-03-03 15:37:59 -08:00
Al Viro 30eb43d314 nilfs2: i_nlink races in rename()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-03 01:28:17 -05:00
Ryusuke Konishi 72746ac643 nilfs2: fix regression that i-flag is not set on changeless checkpoints
According to the report from Jiro SEKIBA titled "regression in
2.6.37?"  (Message-Id: <8739n8vs1f.wl%jir@sekiba.com>), on 2.6.37 and
later kernels, lscp command no longer displays "i" flag on checkpoints
that snapshot operations or garbage collection created.

This is a regression of nilfs2 checkpointing function, and it's
critical since it broke behavior of a part of nilfs2 applications.
For instance, snapshot manager of TimeBrowse gets to create
meaningless snapshots continuously; snapshot creation triggers another
checkpoint, but applications cannot distinguish whether the new
checkpoint contains meaningful changes or not without the i-flag.

This patch fixes the regression and brings that application behavior
back to normal.

Reported-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Jiro SEKIBA <jir@unicus.jp>
Cc: stable <stable@kernel.org>  [2.6.37]
2011-03-02 09:55:18 +09:00
Miklos Szeredi 2aa15890f3 mm: prevent concurrent unmap_mapping_range() on the same inode
Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"

Gurudas Pai reported the same bug on NFS.

The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode.  For example:

  thread1: going through a big range, stops in the middle of a vma and
     stores the restart address in vm_truncate_count.

  thread2: comes in with a small (e.g. single page) unmap request on
     the same vma, somewhere before restart_address, finds that the
     vma was already unmapped up to the restart address and happily
     returns without doing anything.

Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value.  This could go on forever without any of them being able to
finish.

Truncate and hole punching already serialize with i_mutex.  Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers.  In particular ->d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.

This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.

[ We'll hopefully get rid of all this with the upcoming mm
  preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
  lockbreak" patch in particular.  But that is for 2.6.39 ]

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reported-by: Michael Leun <lkml20101129@newton.leun.net>
Reported-by: Gurudas Pai <gurudas.pai@oracle.com>
Tested-by: Gurudas Pai <gurudas.pai@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-23 19:52:52 -08:00
Ryusuke Konishi 0ca7a5b9ac nilfs2: fix crash after one superblock became unavailable
Fixes the following kernel oops in nilfs_setup_super() which could
arise if one of two super-blocks is unavailable.

> BUG: unable to handle kernel NULL pointer dereference at   (null)
> Pid: 3529, comm: mount.nilfs2 Not tainted 2.6.37 #1 /
> EIP: 0060:[<c03196bc>] EFLAGS: 00010202 CPU: 3
> EIP is at memcpy+0xc/0x1b
> Call Trace:
>  [<f953720e>] ? nilfs_setup_super+0x6c/0xa5 [nilfs2]
>  [<f95369e9>] ? nilfs_get_root_dentry+0x81/0xcb [nilfs2]
>  [<f9537a08>] ? nilfs_mount+0x4f9/0x62c [nilfs2]
>  [<c02745cf>] ? kstrdup+0x36/0x3f
>  [<f953750f>] ? nilfs_mount+0x0/0x62c [nilfs2]
>  [<c0293940>] ? vfs_kern_mount+0x4d/0x12c
>  [<c02a5100>] ? get_fs_type+0x76/0x8f
>  [<c0293a68>] ? do_kern_mount+0x33/0xbf
>  [<c02a784a>] ? do_mount+0x2ed/0x714
>  [<c02a6171>] ? copy_mount_options+0x28/0xfc
>  [<c02a7ce3>] ? sys_mount+0x72/0xaf
>  [<c0473085>] ? syscall_call+0x7/0xb

Reported-by: Wakko Warner <wakko@animx.eu.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Wakko Warner <wakko@animx.eu.org>
Cc: stable <stable@kernel.org> [2.6.37, 2.6.36]
LKML-Reference: <20110121024918.GA29598@animx.eu.org>
2011-01-22 15:22:36 +09:00
Linus Torvalds 275220f0fc Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
  block: ensure that completion error gets properly traced
  blktrace: add missing probe argument to block_bio_complete
  block cfq: don't use atomic_t for cfq_group
  block cfq: don't use atomic_t for cfq_queue
  block: trace event block fix unassigned field
  block: add internal hd part table references
  block: fix accounting bug on cross partition merges
  kref: add kref_test_and_get
  bio-integrity: mark kintegrityd_wq highpri and CPU intensive
  block: make kblockd_workqueue smarter
  Revert "sd: implement sd_check_events()"
  block: Clean up exit_io_context() source code.
  Fix compile warnings due to missing removal of a 'ret' variable
  fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
  block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
  cfq-iosched: don't check cfqg in choose_service_tree()
  fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
  cdrom: export cdrom_check_events()
  sd: implement sd_check_events()
  sr: implement sr_check_events()
  ...
2011-01-13 10:45:01 -08:00
Alexey Dobriyan 57cc7215b7 headers: kobject.h redux
Remove kobject.h from files which don't need it, notably,
sched.h and fs.h.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-10 08:51:44 -08:00
Ryusuke Konishi 365e215ce1 nilfs2: unfold nilfs_dat_inode function
nilfs_dat_inode function was a wrapper to switch between normal dat
inode and gcdat, a clone of the dat inode for garbage collection.

This function got obsolete when the gcdat inode was removed, and now
we can access the dat inode directly from a nilfs object.  So, we will
unfold the wrapper and remove it.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10 14:38:39 +09:00
Ryusuke Konishi bcbc8c648d nilfs2: do not pass sbi to functions which can get it from inode
This removes argument for passing nilfs_sb_info structure from
nilfs_set_file_dirty and nilfs_load_inode_block functions.  We can get
a pointer to the structure from inodes.

[Stephen Rothwell <sfr@canb.auug.org.au>: fix conflict with commit
 b74c79e993]

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10 14:37:54 +09:00
Ryusuke Konishi 06df0f9992 nilfs2: get rid of nilfs_mount_options structure
Only mount_opt member is used in the nilfs_mount_options structure,
and we can simplify it.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10 14:05:46 +09:00
Ryusuke Konishi a7a8447ede nilfs2: simplify nilfs_mdt_freeze_buffer
nilfs_page_get_nth_block() function used in nilfs_mdt_freeze_buffer()
always returns a valid buffer head, so its validity check can be
removed.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10 14:05:46 +09:00
Ryusuke Konishi 888da23c2f nilfs2: get rid of loaded flag from nilfs object
NILFS_LOADED flag of the nilfs object is not used now, so this will
remove it.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10 14:05:46 +09:00
Ryusuke Konishi ae53a0a2ce nilfs2: fix a checkpatch error in page.c
Will correct the following checkpatch error:

 ERROR: trailing whitespace
 #494: FILE: page.c:494:
 + $

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10 14:05:46 +09:00
Ryusuke Konishi 622daaff0a nilfs2: fiemap support
This adds fiemap to nilfs.  Two new functions, nilfs_fiemap and
nilfs_find_uncommitted_extent are added.

nilfs_fiemap() implements the fiemap inode operation, and
nilfs_find_uncommitted_extent() helps to get a range of data blocks
whose physical location has not been determined.

nilfs_fiemap() collects extent information by looping through
nilfs_bmap_lookup_contig and nilfs_find_uncommitted_extent routines.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10 14:05:46 +09:00
Ryusuke Konishi 27e6c7a3ce nilfs2: mark buffer heads as delayed until the data is written to disk
Nilfs does not allocate new blocks on disk until they are actually
written to.  To implement fiemap, we need to deal with such blocks.

To allow successive fiemap patch to distinguish mapped but unallocated
regions, this marks buffer heads of those new blocks as delayed and
clears the flag after the blocks are written to disk.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10 14:05:45 +09:00
Ryusuke Konishi e828949e5b nilfs2: call nilfs_error inside bmap routines
Some functions using nilfs bmap routines can wrongly return invalid
argument error (i.e. -EINVAL) that bmap returns as an internal code
for btree corruption.

This fixes the issue by catching and converting the internal EINVAL to
EIO and calling nilfs_error function inside bmap routines.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10 14:05:45 +09:00
Joe Perches b004a5eb0b fs/nilfs2/super.c: Use printf extension %pV
Using %pV reduces the number of printk calls and
eliminates any possible message interleaving from
other printk calls.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10 14:05:45 +09:00
Nick Piggin b74c79e993 fs: provide rcu-walk aware permission i_ops
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:29 +11:00
Nick Piggin fa0d7e3de6 fs: icache RCU free inodes
RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:26 +11:00
Nick Piggin b7ab39f631 fs: dcache scale dentry refcount
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:21 +11:00
Ryusuke Konishi 947b10ae0a nilfs2: fix regression of garbage collection ioctl
On 2.6.37-rc1, garbage collection ioctl of nilfs was broken due to the
commit 263d90cefc ("nilfs2: remove own inode hash used for GC"),
and leading to filesystem corruption.

The patch doesn't queue gc-inodes for log writer if they are reused
through the vfs inode cache.  Here, gc-inode is the inode which
buffers blocks to be relocated on GC.  That patch queues gc-inodes in
nilfs_init_gcinode() function, but this function is not called when
they don't have I_NEW flag.  Thus, some of live blocks are wrongly
overrode without being moved to new logs.

This resolves the problem by moving the gc-inode queueing to an outer
function to ensure it's done right.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-12-16 14:35:18 +09:00
Ryusuke Konishi f6c26ec508 nilfs2: fix typo in comment of nilfs_dat_move function
Fixes a typo: "uncommited" -> "uncommitted".

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-11-24 12:51:48 +09:00
Dan Carpenter 103cfcf522 nilfs2: nilfs_iget_for_gc() returns ERR_PTR
nilfs_iget_for_gc() returns an ERR_PTR() on failure and doesn't return
NULL.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-11-23 16:32:19 +09:00
Tejun Heo d4d7762995 block: clean up blkdev_get() wrappers and their users
After recent blkdev_get() modifications, open_by_devnum() and
open_bdev_exclusive() are simple wrappers around blkdev_get().
Replace them with blkdev_get_by_dev() and blkdev_get_by_path().

blkdev_get_by_dev() is identical to open_by_devnum().
blkdev_get_by_path() is slightly different in that it doesn't
automatically add %FMODE_EXCL to @mode.

All users are converted.  Most conversions are mechanical and don't
introduce any behavior difference.  There are several exceptions.

* btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
  reason to OR it explicitly on blkdev_put().

* gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
  sb->s_mode.

* With the above changes, sb->s_mode now always should contain
  FMODE_EXCL.  WARN_ON_ONCE() added to kill_block_super() to detect
  errors.

The new blkdev_get_*() functions are with proper docbook comments.
While at it, add function description to blkdev_get() too.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Joern Engel <joern@lazybastard.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: reiserfs-devel@vger.kernel.org
Cc: xfs-masters@oss.sgi.com
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
2010-11-13 11:55:18 +01:00
Tejun Heo e525fd89d3 block: make blkdev_get/put() handle exclusive access
Over time, block layer has accumulated a set of APIs dealing with bdev
open, close, claim and release.

* blkdev_get/put() are the primary open and close functions.

* bd_claim/release() deal with exclusive open.

* open/close_bdev_exclusive() are combination of open and claim and
  the other way around, respectively.

* bd_link/unlink_disk_holder() to create and remove holder/slave
  symlinks.

* open_by_devnum() wraps bdget() + blkdev_get().

The interface is a bit confusing and the decoupling of open and claim
makes it impossible to properly guarantee exclusive access as
in-kernel open + claim sequence can disturb the existing exclusive
open even before the block layer knows the current open if for another
exclusive access.  Reorganize the interface such that,

* blkdev_get() is extended to include exclusive access management.
  @holder argument is added and, if is @FMODE_EXCL specified, it will
  gain exclusive access atomically w.r.t. other exclusive accesses.

* blkdev_put() is similarly extended.  It now takes @mode argument and
  if @FMODE_EXCL is set, it releases an exclusive access.  Also, when
  the last exclusive claim is released, the holder/slave symlinks are
  removed automatically.

* bd_claim/release() and close_bdev_exclusive() are no longer
  necessary and either made static or removed.

* bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
  is no longer necessary and removed.

* open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
  and blkdev_get().  It also has an unexpected extra bdev_read_only()
  test which probably should be moved into blkdev_get().

* open_by_devnum() is modified to take @holder argument and pass it to
  blkdev_get().

Most of bdev open/close operations are unified into blkdev_get/put()
and most exclusive accesses are tested atomically at the open time (as
it should).  This cleans up code and removes some, both valid and
invalid, but unnecessary all the same, corner cases.

open_bdev_exclusive() and open_by_devnum() can use further cleanup -
rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
special features.  Well, let's leave them for another day.

Most conversions are straight-forward.  drbd conversion is a bit more
involved as there was some reordering, but the logic should stay the
same.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Brown <neilb@suse.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: dm-devel@redhat.com
Cc: drbd-dev@lists.linbit.com
Cc: Leo Chen <leochen@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Joern Engel <joern@logfs.org>
Cc: reiserfs-devel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
2010-11-13 11:55:17 +01:00
Al Viro e4c59d61e8 convert nilfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-29 04:16:53 -04:00
Linus Torvalds 426e1f5cec Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
  split invalidate_inodes()
  fs: skip I_FREEING inodes in writeback_sb_inodes
  fs: fold invalidate_list into invalidate_inodes
  fs: do not drop inode_lock in dispose_list
  fs: inode split IO and LRU lists
  fs: switch bdev inode bdi's correctly
  fs: fix buffer invalidation in invalidate_list
  fsnotify: use dget_parent
  smbfs: use dget_parent
  exportfs: use dget_parent
  fs: use RCU read side protection in d_validate
  fs: clean up dentry lru modification
  fs: split __shrink_dcache_sb
  fs: improve DCACHE_REFERENCED usage
  fs: use percpu counter for nr_dentry and nr_dentry_unused
  fs: simplify __d_free
  fs: take dcache_lock inside __d_path
  fs: do not assign default i_ino in new_inode
  fs: introduce a per-cpu last_ino allocator
  new helper: ihold()
  ...
2010-10-26 17:58:44 -07:00
Michael Rubin f629d1c9bd mm: add account_page_writeback()
To help developers and applications gain visibility into writeback
behaviour this patch adds two counters to /proc/vmstat.

  # grep nr_dirtied /proc/vmstat
  nr_dirtied 3747
  # grep nr_written /proc/vmstat
  nr_written 3618

These entries allow user apps to understand writeback behaviour over time
and learn how it is impacting their performance.  Currently there is no
way to inspect dirty and writeback speed over time.  It's not possible for
nr_dirty/nr_writeback.

These entries are necessary to give visibility into writeback behaviour.
We have /proc/diskstats which lets us understand the io in the block
layer.  We have blktrace for more in depth understanding.  We have
e2fsprogs and debugsfs to give insight into the file systems behaviour,
but we don't offer our users the ability understand what writeback is
doing.  There is no way to know how active it is over the whole system, if
it's falling behind or to quantify it's efforts.  With these values
exported users can easily see how much data applications are sending
through writeback and also at what rates writeback is processing this
data.  Comparing the rates of change between the two allow developers to
see when writeback is not able to keep up with incoming traffic and the
rate of dirty memory being sent to the IO back end.  This allows folks to
understand their io workloads and track kernel issues.  Non kernel
engineers at Google often use these counters to solve puzzling performance
problems.

Patch #4 adds a pernode vmstat file with nr_dirtied and nr_written

Patch #5 add writeback thresholds to /proc/vmstat

Currently these values are in debugfs. But they should be promoted to
/proc since they are useful for developers who are writing databases
and file servers and are not debugging the kernel.

The output is as below:

 # grep threshold /proc/vmstat
 nr_pages_dirty_threshold 409111
 nr_pages_dirty_background_threshold 818223

This patch:

This allows code outside of the mm core to safely manipulate page
writeback state and not worry about the other accounting.  Not using these
routines means that some code will lose track of the accounting and we get
bugs.

Modify nilfs2 to use interface.

Signed-off-by: Michael Rubin <mrubin@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Jiro SEKIBA <jir@unicus.jp>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:06 -07:00
Al Viro 7de9c6ee3e new helper: ihold()
Clones an existing reference to inode; caller must already hold one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:11 -04:00
Linus Torvalds ab34c02afe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (36 commits)
  nilfs2: eliminate sparse warning - "context imbalance"
  nilfs2: eliminate sparse warnings - "symbol not declared"
  nilfs2: get rid of bdi from nilfs object
  nilfs2: change license of exported header file
  nilfs2: add bdev freeze/thaw support
  nilfs2: accept 64-bit checkpoint numbers in cp mount option
  nilfs2: remove own inode allocator and destructor for metadata files
  nilfs2: get rid of back pointer to writable sb instance
  nilfs2: get rid of mi_nilfs back pointer to nilfs object
  nilfs2: see state of root dentry for mount check of snapshots
  nilfs2: use iget for all metadata files
  nilfs2: get rid of GCDAT inode
  nilfs2: add routines to redirect access to buffers of DAT file
  nilfs2: add routines to roll back state of DAT file
  nilfs2: add routines to save and restore bmap state
  nilfs2: do not allocate nilfs_mdt_info structure to gc-inodes
  nilfs2: allow nilfs_clear_inode to clear metadata file inodes
  nilfs2: get rid of snapshot mount flag
  nilfs2: simplify life cycle management of nilfs object
  nilfs2: do not allocate multiple super block instances for a device
  ...
2010-10-23 01:26:47 -07:00
Jiro SEKIBA 6b81e14e64 nilfs2: eliminate sparse warning - "context imbalance"
insert sparse annotations to fix following sparse warning.

fs/nilfs2/segment.c:2681:3: warning: context imbalance in 'nilfs_segctor_kill_thread' - unexpected unlock

nilfs_segctor_kill_thread is only called inside sc_state_lock lock.
sparse doesn't detect the context and warn "unexpected unlock".
__acquires/__releases pretend to lock/unlock the sc_state_lock for sparse.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:40 +09:00
Jiro SEKIBA abc0b50b6b nilfs2: eliminate sparse warnings - "symbol not declared"
change nilfs_dat_commit_free and nilfs_inode_cachep static
to fix following warnings

fs/nilfs2/super.c:72:19: warning: symbol 'nilfs_inode_cachep' was not declared. Should it be static?
fs/nilfs2/dat.c:106:6: warning: symbol 'nilfs_dat_commit_free' was not declared. Should it be static?

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:40 +09:00
Ryusuke Konishi 026a7d63d5 nilfs2: get rid of bdi from nilfs object
Nilfs now can use sb->s_bdi to get backing_dev_info, so we use it
instead of ns_bdi on the nilfs object and remove ns_bdi.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:39 +09:00
Ryusuke Konishi 5beb6e0b20 nilfs2: add bdev freeze/thaw support
Nilfs hasn't supported the freeze/thaw feature because it didn't work
due to the peculiar design that multiple super block instances could
be allocated for a device.  This limitation was removed by the patch
"nilfs2: do not allocate multiple super block instances for a device".

So now this adds the freeze/thaw support to nilfs.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:39 +09:00
Ryusuke Konishi c05dbfc260 nilfs2: accept 64-bit checkpoint numbers in cp mount option
The current implementation doesn't mount snapshots with checkpoint
numbers larger than INT_MAX since it uses match_int() for parsing
"cp=" mount option.

This uses simple_strtoull() for the conversion to resolve the issue.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:39 +09:00
Ryusuke Konishi 2879ed66e4 nilfs2: remove own inode allocator and destructor for metadata files
This finally removes own inode allocator and destructor functions for
metadata files.  Several routines, nilfs_mdt_new(),
nilfs_mdt_new_common(), nilfs_mdt_clear(), nilfs_mdt_destroy(), and
nilfs_alloc_inode_common() will be gone.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:39 +09:00
Ryusuke Konishi 090fd5b101 nilfs2: get rid of back pointer to writable sb instance
Nilfs object holds a back pointer to a writable super block instance
in nilfs->ns_writer, and this became eliminable since sb is now made
per device and all inodes have a valid pointer to it.

This deletes the ns_writer pointer and a reader/writer semaphore
protecting it.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:38 +09:00
Ryusuke Konishi c6e071884a nilfs2: get rid of mi_nilfs back pointer to nilfs object
This removes a back pointer to nilfs object from nilfs_mdt_info
structure that is attached to metadata files.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:38 +09:00
Ryusuke Konishi 032dbb3b50 nilfs2: see state of root dentry for mount check of snapshots
After applied the patch that unified sb instances, root dentry of
snapshots can be left in dcache even after their trees are unmounted.

The orphan root dentry/inode keeps a root object, and this causes
false positive of nilfs_checkpoint_is_mounted function.

This resolves the issue by having nilfs_checkpoint_is_mounted test
whether the root dentry is busy or not.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:38 +09:00
Ryusuke Konishi f1e89c86fd nilfs2: use iget for all metadata files
This makes use of iget5_locked to allocate or get inode for metadata
files to stop using own inode allocator.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:38 +09:00
Ryusuke Konishi c1c1d70920 nilfs2: get rid of GCDAT inode
This applies prepared rollback function and redirect function of
metadata file to DAT file, and eliminates GCDAT inode.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:38 +09:00
Ryusuke Konishi b1f6a4f294 nilfs2: add routines to redirect access to buffers of DAT file
During garbage collection (GC), DAT file, which converts virtual block
number to real block number, may return disk block number that is not
yet written to the device.

To avoid access to unwritten blocks, the current implementation stores
changes to the caches of GCDAT during GC and atomically commit the
changes into the DAT file after they are written to the device.

This patch, instead, adds a function that makes a copy of specified
buffer and stores it in nilfs_shadow_map, and a function to get the
backup copy as needed (nilfs_mdt_freeze_buffer and
nilfs_mdt_get_frozen_buffer respectively).

Before DAT changes block number in an entry block, it makes a copy and
redirect access to the buffer so that address conversion function
(i.e. nilfs_dat_translate) refers to the old address saved in the
copy.

This patch gives requisites for such redirection.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:37 +09:00
Ryusuke Konishi ebdfed4dc5 nilfs2: add routines to roll back state of DAT file
This adds optional function to metadata files which makes a copy of
bmap, page caches, and b-tree node cache, and rolls back to the copy
as needed.

This enhancement is intended to displace gcdat inode that provides a
similar function in a different way.

In this patch, nilfs_shadow_map structure is added to store a copy of
the foregoing states.  nilfs_mdt_setup_shadow_map relates this
structure to a metadata file.  And, nilfs_mdt_save_to_shadow_map() and
nilfs_mdt_restore_from_shadow_map() provides save and restore
functions respectively.  Finally, nilfs_mdt_clear_shadow_map() clears
states of nilfs_shadow_map.

The copy of b-tree node cache and page cache is made by duplicating
only dirty pages into corresponding caches in nilfs_shadow_map.  Their
restoration is done by clearing dirty pages from original caches and
by copying dirty pages back from nilfs_shadow_map.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:37 +09:00
Ryusuke Konishi a8070dd365 nilfs2: add routines to save and restore bmap state
This adds routines to save and restore the state of bmap structure.
The bmap state is stored in a given nilfs_bmap_store object.

These routines will be used to roll back the state of dat inode
without using gcdat inode.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:37 +09:00
Ryusuke Konishi adbb39b548 nilfs2: do not allocate nilfs_mdt_info structure to gc-inodes
GC-inode now doesn't need the nilfs_mdt_info structure and there is no
reason that it is a sort of metadata files.

This stops the allocation and makes them not dependent on metadata
file routines.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:37 +09:00
Ryusuke Konishi 518d1a6a1d nilfs2: allow nilfs_clear_inode to clear metadata file inodes
Allows clear inode function (nilfs_clear_inode) to handle metadata
files that uses bitmap-based object alloctor.  DAT and ifile
correspond to this.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:37 +09:00
Ryusuke Konishi 348fe8da13 nilfs2: simplify life cycle management of nilfs object
This stops pre-allocating nilfs object in nilfs_get_sb routine, and
stops managing its life cycle by reference counting.

nilfs_find_or_create_nilfs() function, nilfs->ns_mount_mutex,
nilfs_objects list, and the reference counter will be removed through
the simplification.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:36 +09:00
Ryusuke Konishi f11459ad7d nilfs2: do not allocate multiple super block instances for a device
This stops allocating multiple super block instances for a device.

All snapshots and a current mode mount (i.e. latest tree) will be
controlled with nilfs_root objects that are kept within an sb
instance.

nilfs_get_sb() is rewritten so that it always has a root object for
the latest tree and snapshots make additional root objects.

The root dentry of the latest tree is binded to sb->s_root even if it
isn't attached on a directory.  Root dentries of snapshots or the
latest tree are binded to mnt->mnt_root on which they are mounted.

With this patch, nilfs_find_sbinfo() function, nilfs->ns_supers list,
and nilfs->ns_current back pointer, are deleted.  In addition,
init_nilfs() and load_nilfs() are simplified since they will be called
once for a device, not repeatedly called for mount points.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:36 +09:00
Ryusuke Konishi ab4d8f7ebf nilfs2: split out nilfs_attach_snapshot
This splits the code to attach snapshots into a separate routine for
convenience sake.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:36 +09:00
Ryusuke Konishi 367ea33486 nilfs2: split out nilfs_get_root_dentry
This splits the code to allocate root dentry into a separate routine
for convenience in successive changes.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:35 +09:00
Ryusuke Konishi dc3d3b810a nilfs2: deny write access to inodes in snapshots
Snapshots of nilfs are read-only.

After super block instances (sb) will be unified, nilfs will need to
check write access by a way other than implicit test with
IS_RDONLY(inode).  This is because IS_RDONLY() refers to MS_RDONLY bit
of inode->i_sb->s_flags and it will become inaccurate after the
unification of sb.

To prepare for the issue, this uses i_op->permission to deny write
access to inodes in snapshots.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:35 +09:00
Ryusuke Konishi fd52202930 nilfs2: use checkpoint tree for mount check of snapshots
This rewrites nilfs_checkpoint_is_mounted() function so that it
decides whether a checkpoint is mounted by whether the corresponding
root object is found in checkpoint tree.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:35 +09:00
Ryusuke Konishi b7c0634204 nilfs2: move inode count and block count into root object
This moves sbi->s_inodes_count and sbi->s_blocks_count into nilfs_root
object.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:35 +09:00
Ryusuke Konishi e912a5b668 nilfs2: use root object to get ifile
This rewrites functions using ifile so that they get ifile from
nilfs_root object, and will remove sbi->s_ifile.  Some functions that
don't know the root object are extended to receive it from caller.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:35 +09:00
Ryusuke Konishi 8e656fd518 nilfs2: make snapshots in checkpoint tree exportable
The previous export operations cannot handle multiple versions of
a filesystem if they belong to the same sb instance.

This adds a new type of file handle and extends export operations so
that they can get the inode specified by a checkpoint number as well
as an inode number and a generation number.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:34 +09:00
Ryusuke Konishi 4d8d9293dc nilfs2: set pointer to root object in inodes
This puts a pointer to nilfs_root object in the private part of
on-memory inode, and makes nilfs_iget function pick up the inode with
the same root object.

Non-root inodes inherit its nilfs_root object from parent inode.  That
of the root inode is allocated through nilfs_attach_checkpoint()
function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:34 +09:00
Ryusuke Konishi ba65ae4729 nilfs2: add checkpoint tree to nilfs object
To hold multiple versions of a filesystem in one sb instance, a new
on-memory structure is necessary to handle one or more checkpoints.

This adds a red-black tree of checkpoints to nilfs object, and adds
lookup and create functions for them.

Each checkpoint is represented by "nilfs_root" structure, and this
structure has rb_node to configure the rb-tree.

The nilfs_root object is identified with a checkpoint number.  For
each snapshot, a nilfs_root object is allocated and the checkpoint
number of snapshot is assigned to it.  For a regular mount
(i.e. current mode mount), NILFS_CPTREE_CURRENT_CNO constant is
assigned to the corresponding nilfs_root object.

Each nilfs_root object has an ifile inode and some counters.  These
items will displace those of nilfs_sb_info structure in successive
patches.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:34 +09:00
Ryusuke Konishi 263d90cefc nilfs2: remove own inode hash used for GC
This uses inode hash function that vfs provides instead of the own
hash table for caching gc inodes.  This finally removes the own inode
hash from nilfs.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:34 +09:00
Ryusuke Konishi 5e19a995f4 nilfs2: separate initializer of metadata file inode
This separates a part of initialization code of metadata file inode,
and makes it available from the nilfs iget function that a later patch
will add to.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:34 +09:00
Ryusuke Konishi 0e14a3595b nilfs2: use iget5_locked to get inode
This uses iget5_locked instead of iget_locked so that gc cache can
look up inodes with an inode number and an optional checkpoint number.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:33 +09:00
Ryusuke Konishi 6c43f41000 nilfs2: keep zero value in i_cno except for gc-inodes
On-memory inode structures of nilfs have a member "i_cno" which stores
a checkpoint number related to the inode.  For gc-inodes, this field
indicates version of data each gc-inode caches for GC.  Log writer
temporarily uses "i_cno" to transfer the latest checkpoint number.

This stops the latter use and lets only gc-inodes use it.

The purpose of this patch is to allow the successive change use
"i_cno" for inode lookup.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:33 +09:00
Ryusuke Konishi 7d6cd92fe2 nilfs2: allow nilfs_dirty_inode to mark metadata file inodes dirty
This allows sop->dirty_inode callback function (nilfs_dirty_inode) to
handle metadata file inodes.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:33 +09:00
Ryusuke Konishi b91c9a97c9 nilfs2: allow nilfs_destroy_inode to destroy metadata file inodes
The current nilfs_destroy_inode() doesn't handle metadata file inodes
including gc inodes (dummy inodes used for garbage collection).

This allows nilfs_destroy_inode() to destroy inodes of metadata files.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:33 +09:00
Ryusuke Konishi 9566a7a851 nilfs2: accept future revisions
Compatibility of nilfs partitions is now managed with three feature
sets.  This changes old compatibility check with revision number so
that it can accept future revisions.

Note that we can stop support of experimental versions of nilfs that
doesn't know the feature sets by incrementing NILFS_CURRENT_REV.  We
don't have to do it soon, but it would be a possible option whenever
the need arises.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:33 +09:00
Linus Torvalds a2887097f2 Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
  xen-blkfront: disable barrier/flush write support
  Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
  block: remove BLKDEV_IFL_WAIT
  aic7xxx_old: removed unused 'req' variable
  block: remove the BH_Eopnotsupp flag
  block: remove the BLKDEV_IFL_BARRIER flag
  block: remove the WRITE_BARRIER flag
  swap: do not send discards as barriers
  fat: do not send discards as barriers
  ext4: do not send discards as barriers
  jbd2: replace barriers with explicit flush / FUA usage
  jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
  jbd: replace barriers with explicit flush / FUA usage
  nilfs2: replace barriers with explicit flush / FUA usage
  reiserfs: replace barriers with explicit flush / FUA usage
  gfs2: replace barriers with explicit flush / FUA usage
  btrfs: replace barriers with explicit flush / FUA usage
  xfs: replace barriers with explicit flush / FUA usage
  block: pass gfp_mask and flags to sb_issue_discard
  dm: convey that all flushes are processed as empty
  ...
2010-10-22 17:07:18 -07:00
Jens Axboe fa251f8990 Merge branch 'v2.6.36-rc8' into for-2.6.37/barrier
Conflicts:
	block/blk-core.c
	drivers/block/loop.c
	mm/swapfile.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-19 09:13:04 +02:00
Jan Blunck d6d4c19c5f BKL: Remove BKL from NILFS2
The BKL is only used in put_super, fill_super and remount_fs that are all
three protected by the superblocks s_umount rw_semaphore. Therefore it is
safe to remove the BKL entirely.

Signed-off-by: Jan Blunck <jblunck@infradead.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2010-10-04 21:10:41 +02:00
Jan Blunck db71922217 BKL: Explicitly add BKL around get_sb/fill_super
This patch is a preparation necessary to remove the BKL from do_new_mount().
It explicitly adds calls to lock_kernel()/unlock_kernel() around
get_sb/fill_super operations for filesystems that still uses the BKL.

I've read through all the code formerly covered by the BKL inside
do_kern_mount() and have satisfied myself that it doesn't need the BKL
any more.

do_kern_mount() is already called without the BKL when mounting the rootfs
and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
afs_mntpt_follow_link(). Both later functions are actually the filesystems
follow_link inode operation. vfs_kern_mount() is calling the specified
get_sb function and lets the filesystem do its job by calling the given
fill_super function.

Therefore I think it is safe to push down the BKL from the VFS to the
low-level filesystems get_sb/fill_super operation.

[arnd: do not add the BKL to those file systems that already
       don't use it elsewhere]

Signed-off-by: Jan Blunck <jblunck@infradead.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Christoph Hellwig <hch@infradead.org>
2010-10-04 21:10:10 +02:00
Christoph Hellwig dd3932eddf block: remove BLKDEV_IFL_WAIT
All the blkdev_issue_* helpers can only sanely be used for synchronous
caller.  To issue cache flushes or barriers asynchronously the caller needs
to set up a bio by itself with a completion callback to move the asynchronous
state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
specified when calling blkdev_issue_* and also remove the now unused flags
argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
blkdev_issue_discard we need to keep it for the secure discard flag, which
gains a more descriptive name and loses the bitops vs flag confusion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-16 20:52:58 +02:00
Christoph Hellwig f8c131f5b6 nilfs2: replace barriers with explicit flush / FUA usage
Switch to the WRITE_FLUSH_FUA flag for log writes, remove the EOPNOTSUPP
detection for barriers and stop setting the barrier flag for discards.

tj: nilfs is now fixed to wait for discard completion.  Updated this
    patch accordingly and dropped warning about it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:35:39 +02:00
Ryusuke Konishi 4afc31345e nilfs2: fix leak of shadow dat inode in error path of load_nilfs
If load_nilfs() gets an error while doing recovery, it will fail to
free the shadow inode of dat (nilfs->ns_gc_dat).

This fixes the leak issue.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-08-30 10:18:03 +09:00
Linus Torvalds a28e0852d4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: wait for discard to finish
2010-08-22 09:44:47 -07:00
Linus Torvalds 145c3ae46b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  fs: brlock vfsmount_lock
  fs: scale files_lock
  lglock: introduce special lglock and brlock spin locks
  tty: fix fu_list abuse
  fs: cleanup files_lock locking
  fs: remove extra lookup in __lookup_hash
  fs: fs_struct rwlock to spinlock
  apparmor: use task path helpers
  fs: dentry allocation consolidation
  fs: fix do_lookup false negative
  mbcache: Limit the maximum number of cache entries
  hostfs ->follow_link() braino
  hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
  remove SWRITE* I/O types
  kill BH_Ordered flag
  vfs: update ctime when changing the file's permission by setfacl
  cramfs: only unlock new inodes
  fix reiserfs_evict_inode end_writeback second call
2010-08-18 09:35:08 -07:00
Ryusuke Konishi 1cb0c924fa nilfs2: wait for discard to finish
nilfs_discard_segment() doesn't wait for completion of discard
requests.  This specifies BLKDEV_IFL_WAIT flag when calling
blkdev_issue_discard() in order to fix the sync failure.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@lst.de>
2010-08-19 00:11:06 +09:00
Christoph Hellwig 87e99511ea kill BH_Ordered flag
Instead of abusing a buffer_head flag just add a variant of
sync_dirty_buffer which allows passing the exact type of write
flag required.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 01:09:00 -04:00
Ryusuke Konishi ea1a16f716 nilfs2: fix false warning saying one of two super blocks is broken
After applying commit b2ac86e1, the following message got appeared
after unclean shutdown:

> NILFS warning: broken superblock. using spare superblock.

This turns out to be a false message due to the change which updates
two super blocks alternately.  The secondary super block now can be
selected if it's newer than the primary one.

This kills the false warning by suppressing it if another super block
is not actually broken.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-08-16 11:08:36 +09:00
Ryusuke Konishi af4e36318e nilfs2: fix list corruption after ifile creation failure
If nilfs_attach_checkpoint() gets a memory allocation failure during
creation of ifile, it will return without removing nilfs_sb_info
struct from ns_supers list.  When a concurrently mounted snapshot is
unmounted or another new snapshot is mounted after that, this causes
kernel oops as below:

> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<f83662ff>] nilfs_find_sbinfo+0x74/0xa4 [nilfs2]
> *pde = 00000000
> Oops: 0000 [#1] SMP
<snip>
> Call Trace:
>  [<f835dc29>] ? nilfs_get_sb+0x165/0x532 [nilfs2]
>  [<c1173c87>] ? ida_get_new_above+0x16d/0x187
>  [<c109a7f8>] ? alloc_vfsmnt+0x7e/0x10a
>  [<c1070790>] ? kstrdup+0x2c/0x40
>  [<c1089041>] ? vfs_kern_mount+0x96/0x14e
>  [<c108913d>] ? do_kern_mount+0x32/0xbd
>  [<c109b331>] ? do_mount+0x642/0x6a1
>  [<c101a415>] ? do_page_fault+0x0/0x2d1
>  [<c1099c00>] ? copy_mount_options+0x80/0xe2
>  [<c10705d8>] ? strndup_user+0x48/0x67
>  [<c109b3f1>] ? sys_mount+0x61/0x90
>  [<c10027cc>] ? sysenter_do_call+0x12/0x22

This fixes the problem.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable@kernel.org
2010-08-16 11:08:36 +09:00
Linus Torvalds 2f9e825d3e Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
  block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
  xen-blkfront: fix missing out label
  blkdev: fix blkdev_issue_zeroout return value
  block: update request stacking methods to support discards
  block: fix missing export of blk_types.h
  writeback: fix bad _bh spinlock nesting
  drbd: revert "delay probes", feature is being re-implemented differently
  drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
  drbd: Disable delay probes for the upcomming release
  writeback: cleanup bdi_register
  writeback: add new tracepoints
  writeback: remove unnecessary init_timer call
  writeback: optimize periodic bdi thread wakeups
  writeback: prevent unnecessary bdi threads wakeups
  writeback: move bdi threads exiting logic to the forker thread
  writeback: restructure bdi forker loop a little
  writeback: move last_active to bdi
  writeback: do not remove bdi from bdi_list
  writeback: simplify bdi code a little
  writeback: do not lose wake-ups in bdi threads
  ...

Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.
2010-08-10 15:22:42 -07:00
Linus Torvalds 5f248c9c25 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
  no need for list_for_each_entry_safe()/resetting with superblock list
  Fix sget() race with failing mount
  vfs: don't hold s_umount over close_bdev_exclusive() call
  sysv: do not mark superblock dirty on remount
  sysv: do not mark superblock dirty on mount
  btrfs: remove junk sb_dirt change
  BFS: clean up the superblock usage
  AFFS: wait for sb synchronization when needed
  AFFS: clean up dirty flag usage
  cifs: truncate fallout
  mbcache: fix shrinker function return value
  mbcache: Remove unused features
  add f_flags to struct statfs(64)
  pass a struct path to vfs_statfs
  update VFS documentation for method changes.
  All filesystems that need invalidate_inode_buffers() are doing that explicitly
  convert remaining ->clear_inode() to ->evict_inode()
  Make ->drop_inode() just return whether inode needs to be dropped
  fs/inode.c:clear_inode() is gone
  fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
  ...

Fix up trivial conflicts in fs/nilfs2/super.c
2010-08-10 11:26:52 -07:00
Al Viro 6fd1e5c994 convert nilfs2 to ->evict_inode()
[folded build fix from sfr]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:25 -04:00
Al Viro a4ffdde6e5 simplify checks for I_CLEAR/I_FREEING
add I_CLEAR instead of replacing I_FREEING with it.  I_CLEAR is
equivalent to I_FREEING for almost all code looking at either;
it's there to keep track of having called clear_inode() exactly
once per inode lifetime, at some point after having set I_FREEING.
I_CLEAR and I_FREEING never get set at the same time with the
current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
instead of I_CLEAR without loss of information.  As the result of
such change, checks become simpler and the amount of code that needs
to know about I_CLEAR shrinks a lot.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:44 -04:00
Christoph Hellwig 1025774ce4 remove inode_setattr
Replace inode_setattr with opencoded variants of it in all callers.  This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.

In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:

 spufs: explicitly checks for ATTR_SIZE earlier
 btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
 ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:37 -04:00
Christoph Hellwig 155130a4f7 get rid of block_write_begin_newtrunc
Move the call to vmtruncate to get rid of accessive blocks to the callers
in preparation of the new truncate sequence and rename the non-truncating
version to block_write_begin.

While we're at it also remove several unused arguments to block_write_begin.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:33 -04:00
Christoph Hellwig 6e1db88d53 introduce __block_write_begin
Split up the block_write_begin implementation - __block_write_begin is a new
trivial wrapper for block_prepare_write that always takes an already
allocated page and can be either called from block_write_begin or filesystem
code that already has a page allocated.  Remove the handling of already
allocated pages from block_write_begin after switching all callers that
do it to __block_write_begin.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:32 -04:00
Christoph Hellwig f4e420dc42 clean up write_begin usage for directories in pagecache
For filesystem that implement directories in pagecache we call
block_write_begin with an already allocated page for this code, while the
normal regular file write path uses the default block_write_begin behaviour.

Get rid of the __foofs_write_begin helper and opencode the normal write_begin
call in foofs_write_begin, while adding a new foofs_prepare_chunk helper for
the directory code.  The added benefit is that foofs_prepare_chunk has
a much saner calling convention.

Note that the interruptible flag passed into block_write_begin is always
ignored if we already pass in a page (see next patch for details), and
we never were doing truncations of exessive blocks for this case either so we
can switch directly to block_write_begin_newtrunc.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:31 -04:00
Christoph Hellwig eafdc7d190 sort out blockdev_direct_IO variants
Move the call to vmtruncate to get rid of accessive blocks to the callers
in prepearation of the new truncate calling sequence.  This was only done
for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
was not needed anyway.  Get rid of blockdev_direct_IO_no_locking and
its _newtrunc variant while at it as just opencoding the two additional
paramters is shorted than the name suffix.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:29 -04:00
Christoph Hellwig 7b6d91daee block: unify flags for struct bio and struct request
Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver.  There were two flags in the bio that were
missing in the requests:  BIO_RW_UNPLUG and BIO_RW_AHEAD.  Also I've
renamed two request flags that had a superflous RW in them.

Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:20:39 +02:00
Ryusuke Konishi 89c0fd014d nilfs2: reject filesystem with unsupported block size
This inserts sanity check that refuses to mount a filesystem with
unsupported block size.

Previously, kernel code of nilfs was looking only limitation of
devices though mkfs.nilfs2 limits the range of block sizes; there was
no check that prevents rec_len overflow with larger block sizes.

With this change, block sizes larger than 64KB or smaller than 1KB
will get rejected explicitly by kernel.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-25 23:29:21 +09:00
Ryusuke Konishi 6cda9fa257 nilfs2: avoid rec_len overflow with 64KB block size
With 64KB blocksize, a directory entry can have size 64KB which does
not fit into 16 bits we have for entry length.  So this patch stores
0xffff instead and converts value when read from / written to disk.

Nilfs derives its directory implementation from ext2 filesystem, and
this draws upon the corresponding change on ext2.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-25 20:46:43 +09:00
Ryusuke Konishi c28e69d933 nilfs2: simplify nilfs_get_page function
Implementation of nilfs_get_page() is a bit old as below:

 - A common read_mapping_page inline function is now available instead
   of its read_cache_page use.
 - wait_on_page_locked() use in the function is eliminable since
   read_cache_page function does the same thing through wait_on_page_read().
 - PageUptodate() check is eliminable for the same reason.

This renews nilfs_get_page() based on these points.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-24 17:36:29 +09:00
Ryusuke Konishi c5ca48aabe nilfs2: reject incompatible filesystem
This forces nilfs to check compatibility of feature flags so as to
reject a filesystem with unknown features when it mounts or remounts
the filesystem.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:16 +09:00
Ryusuke Konishi 03bdb5ac58 nilfs2: apply read-ahead for nilfs_btree_lookup_contig
This applies read-ahead to nilfs_btree_do_lookup and
nilfs_btree_lookup_contig functions and extends them to read ahead
siblings of level 1 btree nodes that hold data blocks.

At present, the read-ahead is not applied to most btree operations;
only get_block() callback function, which is used during read of
regular files or directories, receives the benefit.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:16 +09:00
Ryusuke Konishi 4e13e66bee nilfs2: introduce check flag to btree node buffer
nilfs_btree_get_block() now may return untested buffer due to
read-ahead.  This adds a new flag for buffer heads so that the btree
code can check whether the buffer is already verified or not.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:15 +09:00
Ryusuke Konishi 464ece8863 nilfs2: add btree get block function with readahead option
This adds __nilfs_btree_get_block() function that can issue a series
of read-ahead requests for sibling btree nodes.

This read-ahead needs parent node block, so nilfs_btree_readahead_info
structure is added to pass the information that
__nilfs_btree_get_block() needs.

This also replaces the previous nilfs_btree_get_block() implementation
with a wrapper function of __nilfs_btree_get_block().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:15 +09:00
Ryusuke Konishi 26dfdd8e29 nilfs2: add read ahead mode to nilfs_btnode_submit_block
This adds mode argument to nilfs_btnode_submit_block() function and
allows it to issue a read-ahead request.

An optional submit_ptr argument is also added to store the actual
block address for which bio is sent.  submit_ptr is used for a series
of read-ahead requests, and helps to decide if each requested block is
continous to the previous one on disk.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:15 +09:00
Ryusuke Konishi f8e6cc013b nilfs2: fix buffer head leak in nilfs_btnode_submit_block
nilfs_btnode_submit_block() refers to buffer head just before
returning from the function, but it releases the buffer head earlier
than that if nilfs_dat_translate() gets an error.

This has potential for oops in the erroneous case.  This fixes the
issue.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:15 +09:00
Ryusuke Konishi 7c397a81fe nilfs2: eliminate inline keywords in btree implementation
This removes all inline uses from btree.c.  Gcc now agressively apply
inline expansion even for the functions declared without the keyword;
the inline use in btree.c looks excessive.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:15 +09:00
Ryusuke Konishi 5ad2686e92 nilfs2: get maximum number of child nodes from bmap object
The patch "reduce repetitive calculation of max number of child nodes"
gathered up the calculation of maximum number of child nodes into
nilfs_btree_nchildren_per_block() function.  This makes the function
get resultant value from a private variable in bmap object instead of
calculating it for each call.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:14 +09:00
Ryusuke Konishi 9b7b265c9a nilfs2: reduce repetitive calculation of max number of child nodes
The current btree implementation repeats the same calculation on the
maximum number of child nodes.  This is because a few low level
routines use the calculation for index addressing in a btree node
block.

This reduces the calculation by explicitly passing the maximum number
of child nodes (ncmax) through their argument.

This changes parameter passing of the following functions:

 - nilfs_btree_node_dptrs
 - nilfs_btree_node_get_ptr
 - nilfs_btree_node_set_ptr
 - nilfs_btree_node_init
 - nilfs_btree_node_move_left
 - nilfs_btree_node_move_right
 - nilfs_btree_node_insert
 - nilfs_btree_node_delete, and
 - nilfs_btree_get_node

The following functions are removed:

 - nilfs_btree_node_nchildren_min
 - nilfs_btree_node_nchildren_max

Most middle level btree operations are rewritten to pass a proper
ncmax value depending on whether each occurrence of node is "root" or
not.

A constant NILFS_BTREE_ROOT_NCHILDREN_MAX is used for the root node,
whereas nilfs_btree_nchildren_per_block() function is used for
non-root nodes.  If a node could be either root or a non-root node, an
output argument of nilfs_btree_get_node() is used to set up ncmax.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:14 +09:00
Ryusuke Konishi ea64ab87cd nilfs2: optimize calculation of min/max number of btree node children
nilfs_btree_node_nchildren_max() and nilfs_btree_node_nchildren_min()
functions switch return value depending on whether target node is the
root or a node block.  In most uses of these functions, however, the
node type is fixed, and moreover the same calculation is repeatedly
performed in loop.

This unfold these functions depending on context and move them outside
loops wherever possible.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:14 +09:00
Ryusuke Konishi 364ec2d700 nilfs2: remove redundant pointer checks in bmap lookup functions
nilfs_bmap_lookup and its variants are supposed to take a valid
pointer argument to return a block address, thus pointer checks in
nilfs_btree_lookup and nilfs_direct_lookup are needless.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:14 +09:00
Ryusuke Konishi 05d0e94b66 nilfs2: get rid of nilfs_bmap_union
This removes nilfs_bmap_union and finally unifies three structures and
the union in bmap/btree code into one.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:14 +09:00
Ryusuke Konishi dc935be2a0 nilfs2: unify bmap set_target_v operations
This unifies two similar functions nilfs_btree_set_target_v and
nilfs_direct_set_target_v into one, nilfs_bmap_set_target_v.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:14 +09:00
Ryusuke Konishi e7c274f808 nilfs2: get rid of nilfs_btree uses
This replaces all uses of nilfs_btree struct in implementation of
btree mapping with nilfs_bmap struct.

Name of local variable "btree" is kept not to bloat amount of change.
And, a part of local variables "bmap" is renamed to "btree" to uniform
naming rule.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:13 +09:00
Ryusuke Konishi 10ff885ba6 nilfs2: get rid of nilfs_direct uses
This replaces all uses of nilfs_direct struct in implementation of
direct mapping with nilfs_bmap struct.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:13 +09:00
Ryusuke Konishi 583ada4761 nilfs2: remove constant qualifier from argument of bmap propagate
The first argument of bops->bop_propagate operation takes a constant
qualifier, and causes compilation error when removed cast to pointer
of nilfs_btree structure type.  This fixes the issue to prepare for
succesive removal of nilfs_btree struct.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:13 +09:00
Ryusuke Konishi 25b8d7ded0 nilfs2: get rid of private conversion macros on bmap key and pointer
Will remove nilfs_bmap_key_to_dkey(), nilfs_bmap_dkey_to_key(),
nilfs_bmap_ptr_to_dptr(), and nilfs_bmap_dptr_to_ptr() for simplicity.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:13 +09:00
Ryusuke Konishi 1d5385b9f3 nilfs2: verify btree node after reading
This inserts sanity checks soon after read btree node from disk.  This
allows early detection of broken btree nodes, and helps to narrow down
problems due to file system corruption.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:13 +09:00
Ryusuke Konishi cfa913a507 nilfs2: add sanity check in nilfs_btree_add_dirty_buffer
According to the report titled "problem with nilfs_cleanerd" from
Łukasz Wójcicki, nilfs_btree_lookup_dirty_buffers or
nilfs_btree_add_dirty_buffer got memory violation during garbage
collection.

This could happen if a level field of given btree node buffer is
incorrect, which is a crucial internal bug.

This inserts a sanity check to figure out the problem.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:12 +09:00
Ryusuke Konishi 7c01745781 nilfs2: pass remount flag to parse_options
This adds is_remount argument to the parse_options() function that
obtains mount options from strings.

Previously, parse_options did not distinguish context whether it's
called for a new mount or remount, so the caller needed additional
verifications outside the function.

This allows parse_options to verify options and print messages
depending on the context.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:12 +09:00
Ryusuke Konishi c6b4d57ddf nilfs2: use seq_puts to print mount options without argument
This replaces seq_printf() with seq_puts() in nilfs_show_options for
mount options which have no argument.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:12 +09:00
Ryusuke Konishi 802d317754 nilfs2: add nodiscard mount option
Nilfs has "discard" mount option which issues discard/TRIM commands to
underlying block device, but it lacks a complementary option and has
no way to disable the feature through remount.

This adds "nodiscard" option to resolve this imbalance.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:12 +09:00
Ryusuke Konishi 773bc4f3b6 nilfs2: add barrier mount option
Nilfs enables write barriers by default and has "nobarrier" mount
option to disable this feature.  But it lacks the complementary option
and has no way to re-enable the feature on remount.

This adds "barrier" option to resolve this imbalance.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:12 +09:00
Ryusuke Konishi 325020477a nilfs2: do not update log cursor for small change
Super blocks of nilfs are periodically overwritten in order to record
the recent log position.  This shortens recovery time after unclean
unmount, but the current implementation performs the update even for a
few blocks of change.  If the filesystem gets small changes slowly and
continually, super blocks may be updated excessively.

This moderates the issue by skipping update of log cursor if it does
not cross a segment boundary.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:11 +09:00
Ryusuke Konishi 6c12516083 nilfs2: implement fallback for super root search
Although nilfs redundantly uses two super blocks and each may point to
different position on log, the current version of nilfs does not try
fallback to the spare super block when it doesn't find any valid log
at the position that the primary super block points to.

This has been a cause of mount failures due to write order reversals
on barrier less block devices.

This inserts fallback code in error path of nilfs_search_super_root
routine to resolve the mount failure problem.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:11 +09:00
Ryusuke Konishi 2d72b99ecd nilfs2: add missing error code in comment of nilfs_search_super_root
nilfs_search_super_root can return -ENOMEM, but this error code is not
described in its kernel-doc comment.  This fixes the discrepancy.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:11 +09:00
Ryusuke Konishi 843d63baa5 nilfs2: separate setup of log cursor from init_nilfs
This separates a setup routine of log cursor from init_nilfs().  The
routine, nilfs_store_log_cursor, reads the last position of the log
containing a super root, and initializes relevant state on the nilfs
object.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:11 +09:00
Jiro SEKIBA b2ac86e1a8 nilfs2: sync super blocks in turns
This will sync super blocks in turns instead of syncing duplicate
super blocks at the time.  This will help searching valid super root
when super block is written into disk before log is written, which is
happen when barrier-less block devices are unmounted uncleanly.  In
the situation, old super block likely points to valid log.

This patch introduces ns_sbwcount member to the nilfs object and adds
nilfs_sb_will_flip() function; ns_sbwcount counts how many times super
blocks write back to the disk.  And, nilfs_sb_will_flip() decides
whether flipping required or not based on the count of ns_sbwcount to
sync super blocks asymmetrically.

The following functions are also changed:

 - nilfs_prepare_super(): flips super blocks according to the
   argument.  The argument is calculated by nilfs_sb_will_flip()
   function.

 - nilfs_cleanup_super(): sets "clean" flag to both super blocks if
   they point to the same checkpoint.

To update both of super block information, caller of
nilfs_commit_super must set the information on both super blocks.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:11 +09:00
Jiro SEKIBA d26493b6f0 nilfs2: introduce nilfs_prepare_super
This function checks validity of super block pointers.
If first super block is invalid, it will swap the super blocks.
The function should be called before any super block information updates.
Caller must obtain nilfs->ns_sem.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:10 +09:00
Ryusuke Konishi 60f46b7efc nilfs2: separate function that updates log position
This moves out section that updates information of the recent log
position stored in super blocks from nilfs_commit_super to a new
routine named nilfs_set_log_cursor.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:10 +09:00
Ryusuke Konishi c8a11c8a14 nilfs2: add nilfs_set_error
This function marks error state and write it on super blocks.  This is
a preparation for making super block writeback alternately.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:10 +09:00
Ryusuke Konishi 7ecaa46cfe nilfs2: add nilfs_cleanup_super
This function write out filesystem state to super blocks in order to
share the same cleanup work.  This is a preparation for making super
block writeback alternately.

Cc: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:10 +09:00
Ryusuke Konishi bde4e696e4 nilfs2: do not update mount time on rw->ro remount
Mount time field in super block is wrongly updated when nilfs remounts
the partition from read-write to read-only.  This fixes the issue.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:10 +09:00
Ryusuke Konishi 57a4bfc486 nilfs2: get rid of ns_free_segments_count
This counter is unused.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:09 +09:00
Ryusuke Konishi 4762077c7b nilfs2: get rid of macros for segment summary information
This removes macros to test segment summary flags and redefines a few
relevant macros with inline functions.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:09 +09:00
Ryusuke Konishi 85655484f8 nilfs2: do not use nilfs_segsum_info structure in recovery code
This will get rid of nilfs_segsum_info use from recovery functions for
simplicity.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:09 +09:00
Ryusuke Konishi 354fa8be28 nilfs2: divide load_segment_summary function
load_segment_summary function has two distinct roles: getting summary
header of a log, and verifying consistencies of the log.

This divide it into two corresponding functions, nilfs_read_log_header
and nilfs_validate_log to clarify the meaning.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:09 +09:00
Ryusuke Konishi aee5ce2f57 nilfs2: rename nilfs_recover_logical_segments function
The function name of nilfs_recover_logical_segments makes no sense.
This changes the name into nilfs_salvage_orphan_logs to clarify the
role of the function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:09 +09:00
Ryusuke Konishi 8b94025c00 nilfs2: refactor recovery logic routines
Most functions in recovery code take an argument of a super block
instance or a nilfs_sb_info struct for convenience sake.

This replaces them aggressively with a nilfs object by applying
__bread and __breadahead against routines using sb_bread and
sb_breadahead.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:08 +09:00
Ryusuke Konishi 92c60ccaf3 nilfs2: add blocksize member to nilfs object
This stores blocksize in nilfs objects for the successive refactoring
of recovery logic.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:08 +09:00
Ryusuke Konishi c29684d683 nilfs2: remove obsolete declarations of cache constructor and destructor
The commit 41c88bd7 ("nilfs2: cleanup multi
kmem_cache_{create,destroy} code") consolidated slab constructors and
destructors used in nilfs, but it left some declarations in header
files.

This gets rid of the obsolete declarations.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-31 20:50:29 +09:00
Ryusuke Konishi 84cb099985 nilfs2: fix style issue in nilfs_destroy_cachep
This gets rid of unwanted space chars in front of conditional
sentences of nilfs_destroy_cachep().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-31 20:50:29 +09:00
Christoph Hellwig 7ea8085910 drop unused dentry argument to ->fsync
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:05:02 -04:00
Linus Torvalds e8bebe2f71 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (69 commits)
  fix handling of offsets in cris eeprom.c, get rid of fake on-stack files
  get rid of home-grown mutex in cris eeprom.c
  switch ecryptfs_write() to struct inode *, kill on-stack fake files
  switch ecryptfs_get_locked_page() to struct inode *
  simplify access to ecryptfs inodes in ->readpage() and friends
  AFS: Don't put struct file on the stack
  Ban ecryptfs over ecryptfs
  logfs: replace inode uid,gid,mode initialization with helper function
  ufs: replace inode uid,gid,mode initialization with helper function
  udf: replace inode uid,gid,mode init with helper
  ubifs: replace inode uid,gid,mode initialization with helper function
  sysv: replace inode uid,gid,mode initialization with helper function
  reiserfs: replace inode uid,gid,mode initialization with helper function
  ramfs: replace inode uid,gid,mode initialization with helper function
  omfs: replace inode uid,gid,mode initialization with helper function
  bfs: replace inode uid,gid,mode initialization with helper function
  ocfs2: replace inode uid,gid,mode initialization with helper function
  nilfs2: replace inode uid,gid,mode initialization with helper function
  minix: replace inode uid,gid,mode init with helper
  ext4: replace inode uid,gid,mode init with helper
  ...

Trivial conflict in fs/fs-writeback.c (mark bitfields unsigned)
2010-05-21 19:37:45 -07:00
Dmitry Monakhov 73459dcc67 nilfs2: replace inode uid,gid,mode initialization with helper function
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21 18:31:25 -04:00
Jens Axboe ee9a3607fb Merge branch 'master' into for-2.6.35
Conflicts:
	fs/ext3/fsync.c

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:27:26 +02:00
Ryusuke Konishi d240e06713 nilfs2: disallow remount of snapshot from/to a regular mount
Snapshots and regular ro/rw mounts are essentially-different within
the meaning whether the checkpoint is static or not and is marked with
a snapshot flag or not.

The current implemenation, however, allows to remount a snapshot to a
regular rw-mount if the checkpoint number equals the latest one.

This transition is actually impossible since changing a checkpoint to
a snapshot makes another checkpoint, thus the condition is never
satisfied.

This fixes the weird state of affairs, and specifically separates
snapshots and regular rw/ro-mounts.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:34 +09:00
Ryusuke Konishi cdce214e39 nilfs2: use huge_encode_dev/huge_decode_dev
This replaces uses of new_encode_dev/new_decode_dev with their 64-bit
counterparts, huge_encode_dev/huge_decode_dev respectively.

This is just for clarification and has no impact on the disk format.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:34 +09:00
Ryusuke Konishi b87ca91948 nilfs2: update comment on deactivate_super at nilfs_get_sb
deactivate_super was replaced with deactivate_locked_super, but the
comment of nilfs_get_sb remain unchanged.  This renews the comment.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:34 +09:00
Ryusuke Konishi e2d1591a13 nilfs2: replace MS_VERBOSE with MS_SILENT
MS_VERBOSE is deprecated.  This replaces it with MS_SILENT in
reference to get_sb_bdev function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:34 +09:00
Ryusuke Konishi 4571b82cdc nilfs2: add missing initialization of s_mode
An fmode_t argument is passed to kill_block_super() through s_mode
member of the super_block structure.  This is used to release the
block device with the same mode, however, nilfs does not set s_mode
anywhere.

This modifies nilfs_get_sb function to properly initialize the s_mode
member.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:33 +09:00
Ryusuke Konishi 13e905592b nilfs2: fix misuse of open_bdev_exclusive/close_bdev_exclusive
The second argument of open_bdev_exclusive/close_bdev_exclusive takes
fmode_t flags instead of mount flags.  This fixes the misuse.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:33 +09:00
Ryusuke Konishi 25294d8c37 nilfs2: use checkpoint number instead of timestamp to select super block
Nilfs maintains two super blocks, and selects the new one on mount
time if they both have valid checksums and their timestamps differ.

However, this has potential for mis-selection since the system clock
may be rewinded and the resolution of the timestamps is not high.

Usually this doesn't become an issue because both super blocks are
updated at the same time when the file system is unmounted.  Even if
the file system wasn't unmounted cleanly, the roll-forward recovery
will find the proper log which stores the latest super root.  Thus,
the issue can appear only if update of one super block fails and the
clock happens to be rewinded.

This fixes the issue by using checkpoint numbers instead of timestamps
to pick the super block storing the location of the latest log.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:32 +09:00
Ryusuke Konishi 34cb9b5c97 nilfs2: add missing endian conversion on super block magic number
This adds missing endian conversions in comparision of the magic
number of super blocks.  It was coincidence that prior versions didn't
incur problems; the upper byte of the magic number happened to be
equal to the lower byte.  But, semantically it's wrong to depend on
this.

This won't change anything else nor suffer any compatibility issues.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:32 +09:00
Ryusuke Konishi 4e819509cb nilfs2: make nilfs_sc_*_ops static
This kills the following sparse warnings:

fs/nilfs2/segment.c:567:28: warning: symbol 'nilfs_sc_file_ops' was not declared. Should it be static?
fs/nilfs2/segment.c:617:28: warning: symbol 'nilfs_sc_dat_ops' was not declared. Should it be static?
fs/nilfs2/segment.c:625:28: warning: symbol 'nilfs_sc_dsync_ops' was not declared. Should it be static?

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:31 +09:00
Ryusuke Konishi db55d92252 nilfs2: add kernel doc comments to persistent object allocator functions
The implementation of persistent object allocator (alloc.c) is poorly
documented.  This adds kernel doc style comments on that functions.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:31 +09:00
Li Hong fdce895ea5 nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info
In nilfs_segctor_thread(), timer is a local variable allocated on stack. Its
address can't be set to sci->sc_timer and passed in several procedures.

It works now by chance, just because other procedures are called by
nilfs_segctor_thread() directly or indirectly and the stack hasn't been
deallocated yet.

Signed-off-by: Li Hong <lihong.hi@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:31 +09:00
Li Hong 154ac5a830 nilfs2: remove nilfs_segctor_init() in segment.c
There are only two lines of code in nilfs_segctor_init(). From a logic
design view, the first line 'sci->sc_seq_done = sci->sc_seq_request;'
should be put in nilfs_segctor_new(). Even in nilfs_segctor_new(),
this initialization is needless because sci is kzalloc-ed. So
nilfs_segctor_init() is only a wrap call to
nilfs_segctor_start_thread().

Signed-off-by: Li Hong <lihong.hi@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:31 +09:00
Ryusuke Konishi 50614bcf29 nilfs2: insert checkpoint number in segment summary header
This adds a field to record the latest checkpoint number in the
nilfs_segment_summary structure.  This will help to recover the latest
checkpoint number from logs on disk.  This field is intended for
crucial cases in which super blocks have lost pointer to the latest
log.

Even though this will change the disk format, both backward and
forward compatibility is preserved by a size field prepared in the
segment summary header.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:31 +09:00
Li Hong 9f130263f3 nilfs2: add a print message after loading nilfs2
Printing a message after loading a file system is a practice. Add this to
provide a better user-friendly experience.

Signed-off-by: Li Hong <lihong.hi@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:30 +09:00
Li Hong 41c88bd74d nilfs2: cleanup multi kmem_cache_{create,destroy} code
This cleanup patch gives several improvements:

 - Moving all kmem_cache_{create_destroy} calls into one place, which removes
 some small function calls, cleans up error check code and clarify the logic.

 - Mark all initial code in __init section.

 - Remove some very obvious comments.

 - Adjust some declarations.

 - Fix some space-tab issues.

Signed-off-by: Li Hong <lihong.hi@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:30 +09:00
Ryusuke Konishi aaed1d5bfa nilfs2: move out checksum routines to segment buffer code
This moves out checksum routines in log writer to segbuf.c for
cleanup.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:30 +09:00
Ryusuke Konishi 1e2b68bf28 nilfs2: move pointer to super root block into logs
This moves a pointer to buffer storing super root block to each log
buffer from nilfs_sc_info struct for simplicity.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:30 +09:00
Ryusuke Konishi 277a6a3417 nilfs2: change default of 'errors' mount option to 'remount-ro' mode
Like ext3, nilfs has 'errors' mount option to allow specifying desired
behavior on severe errors.

Currently, the default action is 'errors=continue' and has potential
to advance filesystem corruption for severe errors.

This will change the action to 'errors=remount-ro' to avoid the issue.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:30 +09:00
Li Hong 73bb48869b nilfs2: Combine nilfs_btree_release_path() and nilfs_btree_free_path()
nilfs_btree_release_path() and nilfs_btree_free_path() are bound into each other
tightly. Make them into one procedure to clearify the logic and avoid some
misusages.

Signed-off-by: Li Hong <lihong.hi@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:29 +09:00
Li Hong f905440f5e nilfs2: Combine nilfs_btree_alloc_path() and nilfs_btree_init_path()
nilfs_btree_alloc_path() and nilfs_btree_init_path() are bound into each other
tightly. Make them into one procedure to clearify the logic and avoid some
misusages.

Signed-off-by: Li Hong <lihong.hi@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:29 +09:00
Ryusuke Konishi 973bec34bf nilfs2: fix sync silent failure
As of 32a88aa1, __sync_filesystem() will return 0 if s_bdi is not set.
And nilfs does not set s_bdi anywhere.  I noticed this problem by the
warning introduced by the recent commit 5129a469 ("Catch filesystem
lacking s_bdi").

 WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e()
 Hardware name: PowerEdge 2850
 Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas
 Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38
 Call Trace:
  [<c1028422>] warn_slowpath_common+0x60/0x90
  [<c102845f>] warn_slowpath_null+0xd/0x10
  [<c1095936>] vfs_kern_mount+0xc5/0x14e
  [<c1095a03>] do_kern_mount+0x32/0xbd
  [<c10a811e>] do_mount+0x671/0x6d0
  [<c1073794>] ? __get_free_pages+0x1f/0x21
  [<c10a684f>] ? copy_mount_options+0x2b/0xe2
  [<c107b634>] ? strndup_user+0x48/0x67
  [<c10a81de>] sys_mount+0x61/0x8f
  [<c100280c>] sysenter_do_call+0x12/0x32

This ensures to set s_bdi for nilfs and fixes the sync silent failure.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-03 07:36:01 -07:00
Stephen Rothwell 6a47dc1418 nilfs: fix breakage caused by barrier flag changes
After merging the block tree, today's linux-next build (powerpc ppc64_defconfig)
failed like this:

fs/nilfs2/the_nilfs.c: In function 'nilfs_discard_segments':
fs/nilfs2/the_nilfs.c:673: error: 'DISCARD_FL_BARRIER' undeclared (first use in this function)

Caused by commit fbd9b09a17 ("blkdev:
generalize flags for blkdev_issue_fn functions") interacting with commit
e902ec9906 ("nilfs2: issue discard request
after cleaning segments") (which netered Linus' tree on about March 4 -
before v2.6.34-rc1).

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-29 09:32:00 +02:00
Linus Torvalds 44fa2b4bee Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix typo "numer" -> "number" in alloc.c
  nilfs2: Remove an uninitialization warning in nilfs_btree_propagate_v()
  nilfs2: fix a wrong type conversion in nilfs_ioctl()
2010-04-12 18:34:25 -07:00
Ryusuke Konishi be3bd2223b nilfs2: fix typo "numer" -> "number" in alloc.c
Fixes the typo found in a warning message of a persistent object
allocator function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-04-12 01:51:03 +09:00
Li Hong 308f44193f nilfs2: Remove an uninitialization warning in nilfs_btree_propagate_v()
`make CONFIG_NILFS2_FS=m M=fs/nilfs2/` will give the following warnings:

fs/nilfs2/btree.c: In function 'nilfs_btree_propagate':
fs/nilfs2/btree.c:1882: warning: 'maxlevel' may be used uninitialized in this function
fs/nilfs2/btree.c:1882: note: 'maxlevel' was declared here

Set maxlevel = 0 to fix it.

Signed-off-by: Li Hong <lihong.hi@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-04-02 20:03:30 +09:00
Li Hong 753234007f nilfs2: fix a wrong type conversion in nilfs_ioctl()
(void * __user *) should be (void __user *)

Signed-off-by: Li Hong <lihong.hi@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-31 16:55:00 +09:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Ryusuke Konishi d067633b44 nilfs2: fix imperfect completion wait in nilfs_wait_on_logs
nilfs_wait_on_logs has a potential to slip out before completion of
all bio requests when it met an error.  This synchronization fault may
cause unexpected results, for instance, violative access to freed
segment buffers from an end-bio callback routine.

This fixes the issue by ensuring that nilfs_wait_on_logs waits all
given logs.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-24 01:17:20 +09:00
Ryusuke Konishi 110d735a0a nilfs2: fix hang-up of cleaner after log writer returned with error
According to the report from Andreas Beckmann (Message-ID:
<4BA54677.3090902@abeckmann.de>), nilfs in 2.6.33 kernel got stuck
after a disk full error.

This turned out to be a regression by log writer updates merged at
kernel 2.6.33.  nilfs_segctor_abort_construction, which is a cleanup
function for erroneous cases, was skipping writeback completion for
some logs.

This fixes the bug and would resolve the hang issue.

Reported-by: Andreas Beckmann <debian@abeckmann.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable <stable@kernel.org>                     [2.6.33.x]
2010-03-24 00:03:06 +09:00
Ryusuke Konishi 2d8428acae nilfs2: fix duplicate call to nilfs_segctor_cancel_freev
Andreas Beckmann gave me a report that nilfs logged the following
warnings when it got a disk full:

  nilfs_sufile_do_cancel_free: segment 0 must be clean
  nilfs_sufile_do_cancel_free: segment 1 must be clean

These arise from a duplicate call to nilfs_segctor_cancel_freev in an
error path of log writer.  This will fix the issue.

Reported-by: Andreas Beckmann <debian@abeckmann.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-22 14:41:07 +09:00
Ryusuke Konishi c91cea11df nilfs2: remove whitespaces before quoted newlines
This kills the following checkpatch warnings:

 WARNING: unnecessary whitespace before a quoted newline
 #869: FILE: super.c:869:
 +     	           "remount to a different snapshot. \n",

 WARNING: unnecessary whitespace before a quoted newline
 #389: FILE: the_nilfs.c:389:
 +     	    printk(KERN_ERR "NILFS: too short segment. \n");

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-14 10:29:51 +09:00
Ryusuke Konishi 55480a06e9 nilfs2: remove spaces before tabs
This kills the following checkpatch warnings:

 WARNING: please, no space before tabs
 #74: FILE: segment.h:74:
 +^Iunsigned ^I^Iflags;$

 WARNING: please, no space before tabs
 #35: FILE: segbuf.c:35:
 +^Iint ^I^I^Istart, end; /* The region to be submitted */$

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-14 10:29:51 +09:00
Ryusuke Konishi 7a65004bba nilfs2: fix various typos in comments
This fixes various typos I found in comments of nilfs2.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-14 10:29:51 +09:00
Ryusuke Konishi 1621562b6a nilfs2: fix typo "cout" -> "count" in error message
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-14 10:29:50 +09:00
Ryusuke Konishi 9ccf56c138 nilfs2: fix function name typos in docbook comments
Fixes the following typos in docbook comments:

 nilfs_detroy_transaction_cache -> nilfs_destroy_transaction_cache
 nilfs_secgtor_start_timer -> nilfs_segctor_start_timer

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-14 10:29:50 +09:00
Ryusuke Konishi 6c477d44a7 nilfs2: fix discrepancy in use of static specifier
Two segbuf functions, nilfs_segbuf_write and nilfs_segbuf_wait, are
declared with the static storage class specifier, but their
implementations are not.

This fixes the discrepancy.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-14 10:27:27 +09:00
Linus Torvalds 0f2cc4ecd8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
  init: Open /dev/console from rootfs
  mqueue: fix typo "failues" -> "failures"
  mqueue: only set error codes if they are really necessary
  mqueue: simplify do_open() error handling
  mqueue: apply mathematics distributivity on mq_bytes calculation
  mqueue: remove unneeded info->messages initialization
  mqueue: fix mq_open() file descriptor leak on user-space processes
  fix race in d_splice_alias()
  set S_DEAD on unlink() and non-directory rename() victims
  vfs: add NOFOLLOW flag to umount(2)
  get rid of ->mnt_parent in tomoyo/realpath
  hppfs can use existing proc_mnt, no need for do_kern_mount() in there
  Mirror MS_KERNMOUNT in ->mnt_flags
  get rid of useless vfsmount_lock use in put_mnt_ns()
  Take vfsmount_lock to fs/internal.h
  get rid of insanity with namespace roots in tomoyo
  take check for new events in namespace (guts of mounts_poll()) to namespace.c
  Don't mess with generic_permission() under ->d_lock in hpfs
  sanitize const/signedness for udf
  nilfs: sanitize const/signedness in dealing with ->d_name.name
  ...

Fix up fairly trivial (famous last words...) conflicts in
drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c
2010-03-04 08:15:33 -08:00
Al Viro 072f98b463 nilfs: sanitize const/signedness in dealing with ->d_name.name
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:07:58 -05:00
Al Viro 0319003d0d nilfs really shouldn't slap struct dentry on stack...
... especially when it only needs (and initializes) .d_name of it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:07:58 -05:00
Jiro SEKIBA 0d561f12b4 nilfs2: add reader's lock for cno in nilfs_ioctl_sync
This adds reader's lock for the_nilfs->cno in nilfs_ioctl_sync,
for the_nilfs->cno should be proctected by segctor_sem when reading.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-20 21:18:19 +09:00
Jiro SEKIBA 03f29365e8 nilfs2: delete unnecessary condition in load_segment_summary
This is a trivial patch to remove unnecessary condition.

load_segment_summary() checks crc of segment_summary OR crc of whole
log data blocks based on boolean argument full_check.  However,
callers of the function pass only 1 as full_check, which means only
whole log data blocks checking code is running all the time.

This patch deletes the condition and full_check argument and also
deletes enum 'NILFS_SEG_FAIL_CHECKSUM_SEGSUM' and corresponding case
clause, for it is nolonger used anymore.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-18 20:09:03 +09:00
Ryusuke Konishi d1c6b72a72 nilfs2: move iterator to write log into segment buffer
This moves iterator to submit write requests for a series of logs into
segbuf.c, and hides nilfs_segbuf_write() and nilfs_segbuf_wait() in
the file.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-13 12:26:03 +09:00
Ryusuke Konishi e605f0a724 nilfs2: get rid of s_dirt flag use
This replaces s_dirt flag use in nilfs with a new flag added on the
nilfs object.  The s_dirt flag was used to indicate if
sop->write_super() should be called, however the current version of
nilfs does not use the callback.  Thus, it can be replaced with the
own flag.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Jiro SEKIBA <jir@unicus.jp>
2010-02-13 12:26:03 +09:00
Ryusuke Konishi dcd7618695 nilfs2: get rid of nilfs_segctor_req struct
This will clean up nilfs_segctor_req struct and the obscure request
argument passed among private methods of segment constructor.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-13 12:26:03 +09:00
Jiro SEKIBA 086d1764b2 nilfs2: delete unnecessary condition in nilfs_dat_translate
This is a trivial patch to delete unnecessary condition in nilfs_dat_translate.

nilfs_dat_translate() will asign translated address to *blocknrp if blocknrp
is not NULL.  However the condition is unneeded, because all callers of
nilfs_dat_translate() pass blocknrp properly.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-13 12:26:03 +09:00
Ryusuke Konishi fe5f171bb2 nilfs2: fix potential hang in nilfs_error on errors=remount-ro
nilfs_error() calls nilfs_detach_segment_constructor() if
errors=remount-ro option is specified, and this may lead to a hang due
to recursive locking of, for instance, nilfs->ns_segctor_sem and
others.

In this case, detaching segment constructor is not necessary because
read-only flag is set to the filesystem and further writes are
blocked.

This fixes the potential hang issue by removing the
nilfs_detach_segment_constructor() call from nilfs_error.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-13 12:26:03 +09:00
Ryusuke Konishi 7512487e6d nilfs2: use mnt_want_write in ioctls where write access is needed
A few nilfs2 ioctls need to ask for and then later release write
access to the mount in order to avoid potential write to read-only
mounts.

This adds the missing mnt_want_write and mnt_drop_write in
nilfs_ioctl_change_cpmode, nilfs_ioctl_delete_checkpoint, and
nilfs_ioctl_clean_segments.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-13 12:26:02 +09:00
Jiro SEKIBA e902ec9906 nilfs2: issue discard request after cleaning segments
This adds a function to send discard requests for given array of
segment numbers, and calls the function when garbage collection
succeeded.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-13 12:26:02 +09:00
Ryusuke Konishi 3256a05531 nilfs2: fix potential leak of dirty data on umount
This fixes incorrect usage of nilfs_segctor_confirm() test function in
nilfs_segctor_destroy(); nilfs_segctor_confirm() returns zero if the
filesystem is not clean, so its use in nilfs_segctor_destroy() needs
inversion.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-01-31 14:57:31 +09:00
Tobias Klauser 33e189bd57 nilfs2: Storage class should be before const qualifier
The C99 specification states in section 6.11.5:

The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-12-25 13:01:50 +09:00
Jiro SEKIBA 5ee5814832 nilfs2: trivial coding style fix
This is a trivial style fix patch to mend errors/warnings
reported by "checkpatch.pl --file".

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-12-25 13:01:50 +09:00
Linus Torvalds b6e3224fb2 Revert "task_struct: make journal_info conditional"
This reverts commit e4c570c4cb, as
requested by Alexey:

 "I think I gave a good enough arguments to not merge it.
  To iterate:
   * patch makes impossible to start using ext3 on EXT3_FS=n kernels
     without reboot.
   * this is done only for one pointer on task_struct"

  None of config options which define task_struct are tristate directly
  or effectively."

Requested-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-17 13:23:24 -08:00
Al Viro a95161aaa8 switch nilfs2 to deactivate_locked_super()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:42 -05:00
Hiroshi Shimamoto e4c570c4cb task_struct: make journal_info conditional
journal_info in task_struct is used in journaling file system only.  So
introduce CONFIG_FS_JOURNAL_INFO and make it conditional.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:27 -08:00
Ryusuke Konishi a694291a62 nilfs2: separate wait function from nilfs_segctor_write
This separates wait function for submitted logs from the write
function nilfs_segctor_write().  A new list of segment buffers
"sc_write_logs" is added to hold logs under writing, and double
buffering is partially applied to hide io latency.

At this point, the double buffering is disabled for blocksize <
pagesize because page dirty flag is turned off during write and dirty
buffers are not properly collected for pages crossing over segments.

To receive full benefit of the double buffering, further refinement is
needed to move the io wait outside the lock section of log writer.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-30 21:17:52 +09:00
Ryusuke Konishi e29df395bc nilfs2: add iterator for segment buffers
This adds a few iterator functions for segment buffers to make it easy
to handle multiple series of logs.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-30 21:06:35 +09:00
Ryusuke Konishi 9c965bac16 nilfs2: hide nilfs_write_info struct in segment buffer code
Hides nilfs_write_info struct and nilfs_segbuf_prepare_write function
in segbuf.c to simplify the interface of nilfs_segbuf_write function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-30 21:06:35 +09:00
Ryusuke Konishi 9284ad2a90 nilfs2: relocate io status variables to segment buffer
This moves io status variables in nilfs_write_info struct to
nilfs_segment_buffer struct.

This is a preparation to hide nilfs_write_info in segment buffer code.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-30 21:05:57 +09:00
Ryusuke Konishi 5f1586d0dd nilfs2: do not return io error for bio allocation failure
Previously, log writer had possibility to set an io error flag on
segments even in case of memory allocation failure.

This fixes the issue.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-29 19:59:00 +09:00
Ryusuke Konishi 0935db7477 nilfs2: use list_splice_tail or list_splice_tail_init
This applies list_splice_tail (or list_splice_tail_init) operation
instead of list_splice (or list_splice_init, respectively) to append a
new list to tail of an existing list.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-29 02:50:46 +09:00
Jiro SEKIBA abdb318b79 nilfs2: replace mark_inode_dirty as nilfs_mark_inode_dirty
Replace mark_inode_dirty() as nilfs_mark_inode_dirty()
to reduce deep function calls.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-27 20:05:16 +09:00
Jiro SEKIBA 3534573b58 nilfs2: delete mark_inode_dirty in nilfs_delete_entry
Delete mark_inode_dirty() in nilfs_delete_entry() to reduce duplicate
mark_inode_dirty() calls both in nilfs_rename() and nilfs_delete_entry().

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-27 20:05:16 +09:00
Jiro SEKIBA 58d55471cb nilfs2: delete mark_inode_dirty in nilfs_commit_chunk
Delete mark_inode_dirty() in nilfs_commit_chunk(), for callers of
nilfs_commit_chunk() will call equivalent mark_inode_dirty()
after calling nilfs_commit_chunk().

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-27 20:05:16 +09:00
Jiro SEKIBA 2093abf9cb nilfs2: change return type of nilfs_commit_chunk
change return type of nilfs_commit_chunk() as void from int,
for nilfs_set_file_dirty() usually does not return error.

This is an intermediate patch to reduce mark_inode_dirty() in
nilfs_commit_chunk().

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-27 20:05:15 +09:00
Jiro SEKIBA 4cd76c3c93 nilfs2: split nilfs_unlink as nilfs_do_unlink and nilfs_unlink
Split nilfs_unlink() to reduce nested transaction and duplicate
mark_inode_dirty() calls when calling nilfs_unlink() from nilfs_rmdir().

nilfs_do_unlink() is an actual unlink functionality which is not
in transaction and does not call mark_inode_dirty() for dentry argument.

nilfs_unlink() is a wrapper function for do_nilfs_unlink() with
transaction and mark_inode_dirty() for dentry argument.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-27 20:05:15 +09:00
Jiro SEKIBA 1749147276 nilfs2: delete redundant mark_inode_dirty
delete redundant mark_inode_dirty() calls

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-27 20:05:15 +09:00
Jiro SEKIBA 565de406e7 nilfs2: expand inode_inc_link_count and inode_dec_link_count
This is an intermidiate patch to reduce redandunt mark_inode_dirty() calls
by calling inode_inc_link_count() and inode_dec_link_count() functions.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-27 20:05:15 +09:00
Jiro SEKIBA 43f8bc262f nilfs2: delete mark_inode_dirty from nilfs_set_link
Delete mark_inode_dirty() from nilfs_set_link() to reduce redundant
mark_inode_dirty() calls in caller of nilfs_set_link().

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-27 20:05:15 +09:00
Jiro SEKIBA 9ca941d4b6 nilfs2: delete mark_inode_dirty in nilfs_new_inode
It is redundant to call mark_inode_dirty() in nilfs_new_inode() because
all caller of nilfs_new_inode() will call mark_inode_dirty()
after calling nilfs_new_inode() directly or indirectly in transaction.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-27 20:05:15 +09:00
Ryusuke Konishi 0234576d04 nilfs2: add norecovery mount option
This adds "norecovery" mount option which disables temporal write
access to read-only mounts or snapshots during mount/recovery.
Without this option, write access will be even performed for those
types of mounts; the temporal write access is needed to mount root
file system read-only after an unclean shutdown.

This option will be helpful when user wants to prevent any write
access to the device.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Eric Sandeen <sandeen@redhat.com>
2009-11-20 10:05:52 +09:00
Ryusuke Konishi a057d2c011 nilfs2: add helper to get if volume is in a valid state
This adds a helper function, nilfs_valid_fs() which returns if nilfs
is in a valid state or not.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:52 +09:00
Ryusuke Konishi f50a4c8149 nilfs2: move recovery completion into load_nilfs function
Although mount recovery of nilfs is integrated in load_nilfs()
procedure, the completion of recovery was isolated from the procedure
and performed at the end of the fill_super routine.

This was somewhat confusing since the recovery is needed for the nilfs
object, not for a super block instance.

To resolve the inconsistency, this will integrate the recovery
completion into load_nilfs().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:52 +09:00
Ryusuke Konishi 050b4142c9 nilfs2: apply readahead for recovery on mount
This inserts readahead in the recovery code.  The readahead request is
issued per segment while searching the latest super root block.

This will shorten mount time after unclean unmount.  A measurement
shows the recovery time was reduced by more than 60 percent:

 e.g. real  0m11.586s -> 0m3.918s  (x 2.96)

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:51 +09:00
Ryusuke Konishi f021759d74 nilfs2: clean up get/put function of a segment usage
This eliminates obsolete nilfs_get_sufile_get_segment_usage() and
nilfs_set_sufile_segment_usage() from sufile.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:51 +09:00
Ryusuke Konishi 071ec54dd7 nilfs2: move routine to set segment usage into sufile
This adds nilfs_sufile_set_segment_usage() function in sufile to
replace direct access to the sufile metadata in log writer code.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:51 +09:00
Ryusuke Konishi 61a189e9c6 nilfs2: move routine marking segment usage dirty into sufile
This adds nilfs_sufile_mark_dirty() function in sufile to replace
nilfs_touch_segusage() function in log writer code.  This is a
preparation for the further cleanup which will move out low level
sufile operations in the log writer.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:51 +09:00
Ryusuke Konishi 70622a2091 nilfs2: insert cache operation in palloc get block routines
This implements cache operation in get block routines of palloc code:
nilfs_palloc_get_desc_block(), nilfs_palloc_get_bitmap_block(), and
nilfs_palloc_get_entry_block().

This will complete the palloc cache.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:51 +09:00
Ryusuke Konishi 49fa7a5902 nilfs2: add palloc cache to ifile
This adds the palloc cache to ifile.  The palloc cache is allocated on
the extended region of nilfs_mdt_info struct.  The struct
nilfs_ifile_info defines the extended on memory structure of ifile.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi c3ea56c800 nilfs2: flush palloc cache before manipulating data pages of GC dat
Data pages in gcdat metadata file (i.e. the secondary DAT for GC), are
cleared or even moved back to the normal DAT when a shot of garbage
collection was done.

Buffer heads held by the palloc cache of gcdat must be cleared before
these page cache manipulation.  This adds nilfs_palloc_clear_cache()
to ensure this.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi 8908b2f70b nilfs2: add palloc cache to dat
This adds the palloc cache to DAT file.  The palloc cache is allocated
on the extended region of nilfs_mdt_info struct.  The struct
nilfs_dat_info defines the extended on memory structure of DAT.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi db38d5ad32 nilfs2: add cache framework for persistent object allocator
This adds setup and cleanup routines of the persistent object
allocator cache.

According to ftrace analyses, accessing buffers of the DAT file
suffers indispensable overhead many times.  To mitigate the overhead,
This introduce cache framework for the persistent object allocator
(palloc) which the DAT file and ifile are using.

struct nilfs_palloc_cache represents the cache object per metadata
file using palloc.

The cache is initialized through nilfs_palloc_setup_cache() and
destroyed by nilfs_palloc_destroy_cache(); callers of the former
function will be added to individual allocators of DAT and ifile on
successive patches.

nilfs_palloc_destroy_cache() will be called from nilfs_mdt_destroy()
if the cache is attached to a metadata file.  A companion function
nilfs_palloc_clear_cache() is provided to allow releasing buffer head
references independently with the cleanup task.  This adjunctive
function will be used before invalidating pages of metadata file with
the cache.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi 141bbdba9c nilfs2: unfold nilfs_palloc_block_get_bitmap function
This expands a trivial address calculation in the function into its
every callsite. This expansion improves readability of the callers.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi 1376e931b7 nilfs2: eliminate nilfs_btnode_get function
This removes the obsolete nilfs_btnode_get() function and makes
nilfs_btree_get_block() directly call nilfs_btnode_submit_block().

This expansion will provide better opportunity for code optimization.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi 75f65edfcc nilfs2: remove newblk argument from nilfs_btnode_submit_block
This removes the obsolete argument from nilfs_btnode_submit_block().
This will complete separating a create function of btree node.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi 45f4910bc0 nilfs2: use nilfs_btnode_create_block function
This displaces nilfs_btnode_get() use to create new btree node block
with nilfs_btnode_create_block.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:49 +09:00
Ryusuke Konishi d501d73689 nilfs2: separate function for creating new btree node block
Adds a separate routine for creating a btree node block.  This is a
preparation to reduce the depth of function calls during submitting
btree node buffer.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:49 +09:00
Ryusuke Konishi b34a65069c nilfs2: avoid readahead on metadata file for create mode
This turns off readhead action of metadata file if nilfs_mdt_get_block
function was called with a create flag.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:49 +09:00
Ryusuke Konishi ef7d4757a5 nilfs2: simplify nilfs_sufile_get_ncleansegs function
Previously, this function took an status code to return possible error
codes.  The ("nilfs2: add local variable to cache the number of clean
segments") patch removed the possibility to return errors.

So, this simplifies the function definition to make it directly return
the number of clean segments.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:49 +09:00
Ryusuke Konishi aa474a2201 nilfs2: add local variable to cache the number of clean segments
This makes it possible for sufile to get the number of clean segments
faster.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:49 +09:00
Ryusuke Konishi 7b16c8a211 nilfs2: unfold nilfs_sufile_block_get_header function
This unfolds the nilfs_sufile_block_get_header() function for
simplicity.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:48 +09:00
Ryusuke Konishi fd66c0d5c3 nilfs2: hide nilfs_mdt_clear calls in nilfs_mdt_destroy
This will hide a function call of nilfs_mdt_clear() in
nilfs_mdt_destroy().

This ensures nilfs_mdt_destroy() to do cleanup jobs included in
nilfs_mdt_clear().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:48 +09:00
Ryusuke Konishi 3961f0e277 nilfs2: eliminate inlines to directly read/write inode of metadata files
Removes two inline functions: nilfs_mdt_read_inode_direct() and
nilfs_mdt_write_inode_direct().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:48 +09:00
Ryusuke Konishi 8707df3847 nilfs2: separate read method of meta data files on super root block
Will displace nilfs_mdt_read_inode_direct function with an individual
read method: nilfs_dat_read, nilfs_sufile_read, nilfs_cpfile_read.

This provides the opportunity to initialize local variables of each
metadata file after reading the inode.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:48 +09:00
Ryusuke Konishi 79739565e1 nilfs2: separate constructor of metadata files
This will displace nilfs_mdt_new() constructor with individual
metadata file constructors like nilfs_dat_new(), new_sufile_new(),
nilfs_cpfile_new(), and nilfs_ifile_new().

This makes it possible for each metadata file to have own
intialization code.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:48 +09:00
Ryusuke Konishi 5731e191f2 nilfs2: add size option of private object to metadata file allocator
This adds an optional "object size" argument to nilfs_mdt_new_common()
function; the argument specifies the size of private object attached
to a newly allocated metadata file inode.

This will afford space to keep local variables for meta data files.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:48 +09:00
Ryusuke Konishi 9cb4e0d2b9 nilfs2: move out mark_inode_dirty calls from bmap routines
Previously, nilfs_bmap_add_blocks() and nilfs_bmap_sub_blocks() called
mark_inode_dirty() after they changed the number of data blocks.

This moves these calls outside bmap outermost functions like
nilfs_bmap_insert() or nilfs_bmap_truncate().

This will mitigate overhead for truncate or delete operation since
they repeatedly remove set of blocks.  Nearly 10 percent improvement
was observed for removal of a large file:

 # dd if=/dev/zero of=/test/aaa bs=1M count=512
 # time rm /test/aaa

  real  2.968s -> 2.705s

Further optimization may be possible by eliminating these
mark_inode_dirty() uses though I avoid mixing separate changes here.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:47 +09:00
Ryusuke Konishi 09bf4aae0a nilfs2: stop marking metadata inode dirty within btree operations
Since metadata file routines mark the inode dirty after they
successfully changed bmap objects, nilfs_mdt_mark_dirty() calls in
nilfs_bmap_add_blocks() and nilfs_bmap_sub_blocks() are redundant.

This removes these overlapping calls from the bmap routines.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:47 +09:00
Ryusuke Konishi 30db4e6c3d nilfs2: remove buffer locking from btree code
lock_buffer() and unlock_buffer() uses in btree.c are eliminable
because btree functions gain buffer heads through nilfs_btnode_get(),
which never returns an on-the-fly buffer.

Although nilfs_clear_dirty_page() and nilfs_copy_back_pages() in
nilfs_commit_gcdat_inode() juggle btree node buffers of DAT, this is
safe because these operations are protected by a log writer lock or
the metadata file semaphore of DAT.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:47 +09:00
Ryusuke Konishi a49762fd11 nilfs2: remove buffer locking in nilfs_mark_inode_dirty
This lock is eliminable because inodes on the buffer can be updated
independently.  Although a log writer also fills in bmap data on the
on-disk inodes, this update is exclusively done by a log writer lock.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:47 +09:00
Jiro SEKIBA e2073e7857 nilfs2: cleanup unused match_bool function
match_bool function is not used anymore.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:47 +09:00
Jiro SEKIBA 91f1953bf3 nilfs2: Using nobarrier option instead of barrier=off
Since most of fs using nofoobar style option,
modified barrier=off option as nobarrier.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:47 +09:00
Jiro SEKIBA 6600b9dd8e nilfs2: move definition of struct nilfs_btree_node
This is a trivial patch to expose struct nilfs_fs_btree_node.
The struct should be exposed outside of kernel, for it is disk format.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:46 +09:00
Ryusuke Konishi 9b945d537d nilfs2: get rid of BUG_ON use in btree lookup routines
The current btree lookup routines make a kernel oops when detected
inconsistency in btree blocks.  These routines should instead return a
proper error code because the inconsistency usually comes from
corruption of on-disk metadata.

This fixes the issue by converting BUG_ON calls to proper error
handlings.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:46 +09:00
Jiro SEKIBA 18dafac1a4 nilfs2: deleted inconsistent comment in nilfs_load_inode_block()
The comment says, "Caller of this function MUST lock s_inode_lock",
however just above the comment, it locks s_inode_lock in the function.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-15 17:17:46 +09:00
Ryusuke Konishi c1ea985c71 nilfs2: fix lock order reversal in chcp operation
Will fix the following lock order reversal lockdep detected:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.32-rc6 #7
-------------------------------------------------------
chcp/30157 is trying to acquire lock:
 (&nilfs->ns_mount_mutex){+.+.+.}, at: [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2]

but task is already holding lock:
 (&nilfs->ns_segctor_sem){++++.+}, at: [<fed7ca32>] nilfs_transaction_begin+0xba/0x110 [nilfs2]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&nilfs->ns_segctor_sem){++++.+}:
       [<c105799c>] __lock_acquire+0x109c/0x139d
       [<c1057d26>] lock_acquire+0x89/0xa0
       [<c14151e2>] down_read+0x31/0x45
       [<fed6d77b>] nilfs_attach_checkpoint+0x8f/0x16b [nilfs2]
       [<fed6e393>] nilfs_get_sb+0x3e7/0x653 [nilfs2]
       [<c10c0ccb>] vfs_kern_mount+0x8b/0x124
       [<c10c0db2>] do_kern_mount+0x37/0xc3
       [<c10d7517>] do_mount+0x64d/0x69d
       [<c10d75cd>] sys_mount+0x66/0x95
       [<c1002a14>] sysenter_do_call+0x12/0x32

-> #1 (&type->s_umount_key#31/1){+.+.+.}:
       [<c105799c>] __lock_acquire+0x109c/0x139d
       [<c1057d26>] lock_acquire+0x89/0xa0
       [<c104c0f3>] down_write_nested+0x34/0x52
       [<c10c08fe>] sget+0x22e/0x389
       [<fed6e133>] nilfs_get_sb+0x187/0x653 [nilfs2]
       [<c10c0ccb>] vfs_kern_mount+0x8b/0x124
       [<c10c0db2>] do_kern_mount+0x37/0xc3
       [<c10d7517>] do_mount+0x64d/0x69d
       [<c10d75cd>] sys_mount+0x66/0x95
       [<c1002a14>] sysenter_do_call+0x12/0x32

-> #0 (&nilfs->ns_mount_mutex){+.+.+.}:
       [<c1057727>] __lock_acquire+0xe27/0x139d
       [<c1057d26>] lock_acquire+0x89/0xa0
       [<c1414d63>] mutex_lock_nested+0x41/0x23e
       [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2]
       [<fed801b2>] nilfs_ioctl+0x11a/0x7da [nilfs2]
       [<c10cca12>] vfs_ioctl+0x27/0x6e
       [<c10ccf93>] do_vfs_ioctl+0x491/0x4db
       [<c10cd022>] sys_ioctl+0x45/0x5f
       [<c1002a14>] sysenter_do_call+0x12/0x32

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-13 10:33:24 +09:00
Ryusuke Konishi c083234f15 nilfs2: fix missing cleanup of gc cache on error cases
This fixes an -rc1 regression brought by the commit:
1cf58fa840 ("nilfs2: shorten freeze
period due to GC in write operation v3").

Although the patch moved out a function call of
nilfs_ioctl_move_blocks() to nilfs_ioctl_clean_segments() from
nilfs_ioctl_prepare_clean_segments(), it didn't move corresponding
cleanup job needed for the error case.

This will move the missing cleanup job to the destination function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: Jiro SEKIBA <jir@unicus.jp>
2009-11-08 19:04:25 +09:00
Ryusuke Konishi 5399dd1fc8 nilfs2: fix kernel oops in error case of nilfs_ioctl_move_blocks
This fixes a kernel oops reported by Markus Trippelsdorf in the email
titled "[NILFS users] kernel Oops while running nilfs_cleanerd".

The oops was caused by a bug of error path in
nilfs_ioctl_move_blocks() function, which was inlined in
nilfs_ioctl_clean_segments().

nilfs_ioctl_move_blocks checks duplication of blocks which will be
moved in garbage collection.  But, the check should have be done
within nilfs_ioctl_move_inode_block() to prevent list corruption among
buffers storing the target blocks.

To fix the kernel oops, this moves forward the duplication check
before the list insertion.

I also tested this for stable trees [2.6.30, 2.6.31].

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable <stable@kernel.org>
2009-11-08 19:01:35 +09:00
Ryusuke Konishi 05b4358ad5 nilfs2: add zero-fill for new btree node buffers
Adds missing initialization of newly allocated b-tree node buffers.
This avoids garbage data to be mixed in b-tree node blocks.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-03 12:32:03 +09:00
Ryusuke Konishi aeda7f6343 nilfs2: fix irregular checkpoint creation due to data flush
When nilfs flushes out dirty data to reduce memory pressure, creation
of checkpoints is wrongly postponed.  This bug causes irregular
checkpoint creation especially in small footprint systems.

To correct this issue, a timer for the checkpoint creation has to be
continued if a log writer does not create a checkpoint.

This will do the correction.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-03 12:32:03 +09:00
Ryusuke Konishi b1e19e5601 nilfs2: fix dirty page accounting leak causing hang at write
Bruno Prémont and Dunphy, Bill noticed me that NILFS will certainly
hang on ARM-based targets.

I found this was caused by an underflow of dirty pages counter.  A
b-tree cache routine was marking page dirty without adjusting page
account information.

This fixes the dirty page accounting leak and resolves the hang on
arm-based targets.

Reported-by: Bruno Prémont <bonbons@linux-vserver.org>
Reported-by: Dunphy, Bill <WDunphy@tandbergdata.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Bruno Prémont <bonbons@linux-vserver.org>
Cc: stable <stable@kernel.org>
2009-11-03 12:31:36 +09:00
Alexey Dobriyan 828c09509b const: constify remaining file_operations
[akpm@linux-foundation.org: fix KVM]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:11 -07:00
Ryusuke Konishi 3cc811bffd nilfs2: fix missing initialization of i_dir_start_lookup member
The i_dir_start_lookup field in nilfs_inode_info objects should be
cleared when the objects are allocated, but the the initialization was
missing in case of reading from disk.  This adds the initialization.

Since the variable just gives a start page on directory lookups, the
bug was nonfatal until now.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-29 20:32:13 +09:00
Ryusuke Konishi 1f28fcd925 nilfs2: fix missing zero-fill initialization of btree node cache
This will fix file system corruption which infrequently happens after
mount.  The problem was reported from users with the title "[NILFS
users] Fail to mount NILFS." (Message-ID:
<200908211918.34720.yuri@itinteg.net>), and so forth.  I've also
experienced the corruption multiple times on kernel 2.6.30 and 2.6.31.

The problem turned out to be caused due to discordance between
mapping->nrpages of a btree node cache and the actual number of pages
hung on the cache; if the mapping->nrpages becomes zero even as it has
pages, truncate_inode_pages() returns without doing anything.  Usually
this is harmless except it may cause page leak, but garbage collection
fairly infrequently sees a stale page remained in the btree node cache
of DAT (i.e. disk address translation file of nilfs), and induces the
corruption.

I identified a missing initialization in btree node caches was the
root cause.  This corrects the bug.

I've tested this for kernel 2.6.30 and 2.6.31.

Reported-by: Yuri Chislov <yuri@itinteg.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable <stable@kernel.org>
2009-09-29 20:12:56 +09:00
Alexey Dobriyan f0f37e2f77 const: mark struct vm_struct_operations
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-27 11:39:25 -07:00
Alexey Dobriyan 6e1d5dcc2b const: mark remaining inode_operations as const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:24 -07:00
Alexey Dobriyan 7f09410bbc const: mark remaining address_space_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:24 -07:00
Alexey Dobriyan ac4cfdd6d1 const: mark remaining export_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:24 -07:00
Alexey Dobriyan b87221de6a const: mark remaining super_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:24 -07:00
Jens Axboe 2c96ce9f20 fs: remove bdev->bd_inode_backing_dev_info
It has been unused since it was introduced in:

commit 520808bf20e90fdbdb320264ba7dd5cf9d47dcac
Author: Andrew Morton <akpm@osdl.org>
Date:   Fri May 21 00:46:17 2004 -0700

    [PATCH] block device layer: separate backing_dev_info infrastructure

So lets just kill it.

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16 15:16:18 +02:00
Ryusuke Konishi 41f4db0f48 fs/Kconfig: move nilfs2 outside misc filesystems
Some people asked me questions like the following:

On Wed, 15 Jul 2009 13:11:21 +0200, Leon Woestenberg wrote:
> just wondering, any reasons why NILFS2 is one of the miscellaneous
> filesystems and, for example, btrfs, is not in Kconfig?

Actually, nilfs is NOT a filesystem came from other operating systems,
but a filesystem created purely for Linux.  Nor is it a flash
filesystem but that for generic block devices.

So, this moves nilfs outside the misc category as I responded in LKML
"Re: Why does NILFS2 hide under Miscellaneous filesystems?"
(Message-Id: <20090716.002526.93465395.ryusuke@osrg.net>).

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:16 +09:00
Ryusuke Konishi 0f3fe33b39 nilfs2: convert nilfs_bmap_lookup to an inline function
The nilfs_bmap_lookup() is now a wrapper function of
nilfs_bmap_lookup_at_level().

This moves the nilfs_bmap_lookup() to a header file converting it to
an inline function and gives an opportunity for optimization.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:16 +09:00
Ryusuke Konishi 2e0c2c7392 nilfs2: allow btree code to directly call dat operations
The current btree code is written so that btree functions call dat
operations via wrapper functions in bmap.c when they allocate, free,
or modify virtual block addresses.

This abstraction requires additional function calls and causes
frequent call of nilfs_bmap_get_dat() function since it is used in the
every wrapper function.

This removes the wrapper functions and makes them available from
btree.c and direct.c, which will increase the opportunity of
compiler optimization.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:16 +09:00
Ryusuke Konishi bd8169efae nilfs2: add update functions of virtual block address to dat
This is a preparation for the successive cleanup ("nilfs2: allow btree
to directly call dat operations").

This adds functions bundling a few operations to change an entry of
virtual block address on the dat file.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:15 +09:00
Ryusuke Konishi 7a102b0923 nilfs2: remove individual gfp constants for each metadata file
This gets rid of NILFS_CPFILE_GFP, NILFS_SUFILE_GFP, NILFS_DAT_GFP,
and NILFS_IFILE_GFP.  All of these constants refer to NILFS_MDT_GFP,
and can be removed.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:15 +09:00
Ryusuke Konishi 3218929dbd nilfs2: stop zero-fill of btree path just before free it
The btree path object is cleared just before it is freed.

This will remove the code doing the unnecessary clear operation.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:15 +09:00
Ryusuke Konishi 6d28f7ea43 nilfs2: remove unused btree argument from btree functions
Even though many btree functions take a btree object as their first
argument, most of them are not used in their functions.

This sticky use of the btree argument is hurting code readability and
giving the possibility of inefficient code generation.

So, this removes the unnecessary btree arguments.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:15 +09:00
Ryusuke Konishi 9ead986373 nilfs2: remove nilfs_dat_abort_start and nilfs_dat_abort_free
These functions are not called from any functions.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:15 +09:00
Jiro SEKIBA 1cf58fa840 nilfs2: shorten freeze period due to GC in write operation v3
This is a re-revised patch to shorten freeze period.
This version include a fix of the bug Konishi-san mentioned last time.

When GC is runnning, GC moves live block to difference segments.
Copying live blocks into memory is done in a transaction,
however it is not necessarily to be in the transaction.
This patch will get the nilfs_ioctl_move_blocks() out from
transaction lock and put it before the transaction.

I ran sysbench fileio test against nilfs partition.
I copied some DVD/CD images and created snapshot to create live blocks
before starting the benchmark.

Followings are summary of rc8 and rc8 w/ the patch of per-request
statistics, which is min/max and avg.  I ran each test three times and
bellow is average of those numers.

According to this benchmark result, average time is slightly degrated.
However, worstcase (max) result is significantly improved.
This can address a few seconds write freeze.

- random write per-request performance of rc8
 min   0.843ms
 max 680.406ms
 avg   3.050ms
- random write per-request performance of rc8 w/ this patch
 min   0.843ms -> 100.00%
 max 380.490ms ->  55.90%
 avg   3.233ms -> 106.00%

- sequential write per-request performance of rc8
 min   0.736ms
 max 774.343ms
 avg   2.883ms
- sequential write per-request performance of rc8 w/ this patch
 min   0.720ms ->  97.80%
 max  644.280ms->  83.20%
 avg   3.130ms -> 108.50%

-----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<-----
protection_period       150
selection_policy        timestamp       # timestamp in ascend order
nsegments_per_clean     2
cleaning_interval       2
retry_interval          60
use_mmap
log_priority            info
-----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<-----

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:15 +09:00
Zhu Yanhai 43be0ec038 nilfs2: add more check routines in mount process
nilfs2: Add more safeguard routines and protections in mount process,
which also makes nilfs2 report consistency error messages when
checkpoint number is invalid.

Signed-off-by: Zhu Yanhai <zhu.yanhai@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Zhang Qiang a4f0b9c5b4 nilfs2: An unassigned variable is assigned to a never used structure member
nilfs2: In procedure 'nilfs_get_sb()', when a nilfs filesysttem is
mounted for the first time, local variable 'nilfs->ns_last_cno' is
used before loading the latest checkpoint number from disk (in
'nilfs_fill_super'). 'nilfs->ns_last_cno' is assigned to 'sd.cno', but
'sd.cno' has never been used in the procedure.

Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Ryusuke Konishi c1b353f04a nilfs2: use GFP_NOIO for bio_alloc instead of GFP_NOWAIT
Alberto Bertogli advised me about bio_alloc() use in nilfs:
On Sat, 13 Jun 2009 22:52:40 -0300, Alberto Bertogli wrote:
> By the way, those bio_alloc()s are using GFP_NOWAIT but it looks
> like they could use at least GFP_NOIO or GFP_NOFS, since the caller
> can (and sometimes do) sleep. The only caller is nilfs_submit_bh(),
> which calls nilfs_submit_seg_bio() which can sleep calling
> wait_for_completion().

This takes in the comment and replaces the use of GFP_NOWAIT flag with
GFP_NOIO.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Jiro SEKIBA 1dfa27105a nilfs2: stop using periodic write_super callback
This removes nilfs_write_super and commit super block in nilfs
internal thread, instead of periodic write_super callback.

VFS layer calls ->write_super callback periodically.  However,
it looks like that calling back is ommited when disk I/O is busy.
And when cleanerd (nilfs GC) is runnig, disk I/O tend to be busy thus
nilfs superblock is not synchronized as nilfs designed.

To avoid it, syncing superblock by nilfs thread instead of pdflush.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Jiro SEKIBA 79efdd9411 nilfs2: clean up nilfs_write_super
Separate conditions that check if syncing super block and alternative
super block are required as inline functions to reuse the conditions.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Jiro SEKIBA 6233caa9d5 nilfs2: fix disorder of nilfs_write_super in nilfs_sync_fs
This fixes disorder of nilfs_write_super in nilfs_sync_fs.  Commiting
super block must be the end of the function so that every changes are
reflected.

->sync_fs() is not called frequently so this makes nilfs_sync_fs call
nilfs_commit_super instead of nilfs_write_super.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Jiro SEKIBA ec5d66abdb nilfs2: remove redundant super block commit
This removes redundant super block commit.

nilfs_write_super will call nilfs_commit_super to store super block
into block device.  However, nilfs_put_super will call
nilfs_commit_super right after calling nilfs_write_super.  So calling
nilfs_write_super in nilfs_put_super would be redundant.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:13 +09:00
Jiro SEKIBA b58a285ba4 nilfs2: implement nilfs_show_options to display mount options in /proc/mounts
This is a patch to display mount options in procfs.
Mount options will show up in the /proc/mounts as other fs does.

...
/dev/sda6 /mnt nilfs2 ro,relatime,barrier=off,cp=3,order=strict 0 0
...

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:13 +09:00
Ryusuke Konishi 1435110467 nilfs2: always lookup disk block address before reading metadata block
The current metadata file code skips disk address lookup for its data
block if the buffer has a mapped flag.

This has a potential risk to cause read request to be performed
against the stale block address that GC moved, and it may lead to meta
data corruption.  The mapped flag is safe if the buffer has an
uptodate flag, otherwise it may prevent necessary update of disk
address in the next read.

This will avoid the potential problem by ensuring disk address lookup
before reading metadata block even for buffers with the mapped flag.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:13 +09:00
Ryusuke Konishi 027d6404eb nilfs2: use semaphore to protect pointer to a writable FS-instance
will get rid of nilfs_get_writer() and nilfs_put_writer() pair used to
retain a writable FS-instance for a period.

The pair functions were making up some kind of recursive lock with a
mutex, but they became overkill since the commit
201913ed74.  Furthermore, they caused
the following lockdep warning because the mutex can be released by a
task which didn't lock it:

 =====================================
 [ BUG: bad unlock balance detected! ]
 -------------------------------------
 kswapd0/422 is trying to release lock (&nilfs->ns_writer_mutex) at:
 [<c1359ff5>] mutex_unlock+0x8/0xa
 but there are no more locks to release!

 other info that might help us debug this:
 no locks held by kswapd0/422.

 stack backtrace:
 Pid: 422, comm: kswapd0 Not tainted 2.6.31-rc4-nilfs #51
 Call Trace:
  [<c1358f97>] ? printk+0xf/0x18
  [<c104fea7>] print_unlock_inbalance_bug+0xcc/0xd7
  [<c11578de>] ? prop_put_global+0x3/0x35
  [<c1050195>] lock_release+0xed/0x1dc
  [<c1359ff5>] ? mutex_unlock+0x8/0xa
  [<c1359f83>] __mutex_unlock_slowpath+0xaf/0x119
  [<c1359ff5>] mutex_unlock+0x8/0xa
  [<d1284add>] nilfs_mdt_write_page+0xd8/0xe1 [nilfs2]
  [<c1092653>] shrink_page_list+0x379/0x68d
  [<c109171b>] ? isolate_pages_global+0xb4/0x18c
  [<c1092bd2>] shrink_list+0x26b/0x54b
  [<c10930be>] shrink_zone+0x20c/0x2a2
  [<c10936b7>] kswapd+0x407/0x591
  [<c1091667>] ? isolate_pages_global+0x0/0x18c
  [<c1040603>] ? autoremove_wake_function+0x0/0x33
  [<c10932b0>] ? kswapd+0x0/0x591
  [<c104033b>] kthread+0x69/0x6e
  [<c10402d2>] ? kthread+0x0/0x6e
  [<c1003e33>] kernel_thread_helper+0x7/0x1a

This patch uses a reader/writer semaphore instead of the own lock and
kills this warning.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:13 +09:00
Heiko Carstens b5696e5e0d nilfs2: fix format string compile warning (ino_t)
Unlike on most other architectures ino_t is an unsigned int on s390.
So add an explicit cast to avoid this compile warning:

fs/nilfs2/recovery.c: In function 'recover_dsync_blocks':
fs/nilfs2/recovery.c:555: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t'

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:13 +09:00
Ryusuke Konishi 1b2f5a641b nilfs2: fix ignored error code in __nilfs_read_inode()
The __nilfs_read_inode function is ignoring the error code returned
from nilfs_read_inode_common(), and wrongly delivers a success code
(zero) when it escapes from the function in erroneous cases.

This adds the missing error handling.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:12 +09:00
Ryusuke Konishi b1f1b8ce0a nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key
This will fix the following preempt count underflow reported from
users with the title "[NILFS users] segctord problem" (Message-ID:
<949415.6494.qm@web58808.mail.re1.yahoo.com> and Message-ID:
<debc30fc0908270825v747c1734xa59126623cfd5b05@mail.gmail.com>):

 WARNING: at kernel/sched.c:4890 sub_preempt_count+0x95/0xa0()
 Hardware name: HP Compaq 6530b (KR980UT#ABC)
 Modules linked in: bridge stp llc bnep rfcomm l2cap xfs exportfs nilfs2 cowloop loop vboxnetadp vboxnetflt vboxdrv btusb bluetooth uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 arc4 snd_hda_codec_analog ecb iwlagn iwlcore rfkill lib80211 mac80211 snd_hda_intel snd_hda_codec ehci_hcd uhci_hcd usbcore snd_hwdep snd_pcm tg3 cfg80211 psmouse snd_timer joydev libphy ohci1394 snd_page_alloc hp_accel lis3lv02d ieee1394 led_class i915 drm i2c_algo_bit video backlight output i2c_core dm_crypt dm_mod
 Pid: 4197, comm: segctord Not tainted 2.6.30-gentoo-r4-64 #7
 Call Trace:
  [<ffffffff8023fa05>] ? sub_preempt_count+0x95/0xa0
  [<ffffffff802470f8>] warn_slowpath_common+0x78/0xd0
  [<ffffffff8024715f>] warn_slowpath_null+0xf/0x20
  [<ffffffff8023fa05>] sub_preempt_count+0x95/0xa0
  [<ffffffffa04ce4db>] nilfs_btnode_prepare_change_key+0x11b/0x190 [nilfs2]
  [<ffffffffa04d01ad>] nilfs_btree_assign_p+0x19d/0x1e0 [nilfs2]
  [<ffffffffa04d10ad>] nilfs_btree_assign+0xbd/0x130 [nilfs2]
  [<ffffffffa04cead7>] nilfs_bmap_assign+0x47/0x70 [nilfs2]
  [<ffffffffa04d9bc6>] nilfs_segctor_do_construct+0x956/0x20f0 [nilfs2]
  [<ffffffff805ac8e2>] ? _spin_unlock_irqrestore+0x12/0x40
  [<ffffffff803c06e0>] ? __up_write+0xe0/0x150
  [<ffffffff80262959>] ? up_write+0x9/0x10
  [<ffffffffa04ce9f3>] ? nilfs_bmap_test_and_clear_dirty+0x43/0x60 [nilfs2]
  [<ffffffffa04cd627>] ? nilfs_mdt_fetch_dirty+0x27/0x60 [nilfs2]
  [<ffffffffa04db5fc>] nilfs_segctor_construct+0x8c/0xd0 [nilfs2]
  [<ffffffffa04dc3dc>] nilfs_segctor_thread+0x15c/0x3a0 [nilfs2]
  [<ffffffffa04dbe20>] ? nilfs_construction_timeout+0x0/0x10 [nilfs2]
  [<ffffffff80252633>] ? add_timer+0x13/0x20
  [<ffffffff802370da>] ? __wake_up_common+0x5a/0x90
  [<ffffffff8025e960>] ? autoremove_wake_function+0x0/0x40
  [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2]
  [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2]
  [<ffffffff8025e556>] kthread+0x56/0x90
  [<ffffffff8020cdea>] child_rip+0xa/0x20
  [<ffffffff8025e500>] ? kthread+0x0/0x90
  [<ffffffff8020cde0>] ? child_rip+0x0/0x20

This problem was caused due to a missing radix_tree_preload() call in
the retry path of nilfs_btnode_prepare_change_key() function.

Reported-by: Eric A <eric225125@yahoo.com>
Reported-by: Jerome Poulin <jeromepoulin@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Jerome Poulin <jeromepoulin@gmail.com>
Cc: stable@kernel.org
2009-08-31 12:03:06 +09:00
Ryusuke Konishi a924586036 nilfs2: fix oopses with doubly mounted snapshots
will fix kernel oopses like the following:

 # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test1
 # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test2
 # umount /test1
 # umount /test2

BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1069
in_atomic(): 0, irqs_disabled(): 1, pid: 3886, name: umount.nilfs2
1 lock held by umount.nilfs2/3886:
 #0:  (&type->s_umount_key#31){+.+...}, at: [<c10b398a>] deactivate_super+0x52/0x6c
irq event stamp: 1219
hardirqs last  enabled at (1219): [<c135c774>] __mutex_unlock_slowpath+0xf8/0x119
hardirqs last disabled at (1218): [<c135c6d5>] __mutex_unlock_slowpath+0x59/0x119
softirqs last  enabled at (1214): [<c1033316>] __do_softirq+0x1a5/0x1ad
softirqs last disabled at (1205): [<c1033354>] do_softirq+0x36/0x5a
Pid: 3886, comm: umount.nilfs2 Not tainted 2.6.31-rc6 #55
Call Trace:
 [<c1023549>] __might_sleep+0x107/0x10e
 [<c13603c0>] do_page_fault+0x246/0x397
 [<c136017a>] ? do_page_fault+0x0/0x397
 [<c135e753>] error_code+0x6b/0x70
 [<c136017a>] ? do_page_fault+0x0/0x397
 [<c104f805>] ? __lock_acquire+0x91/0x12fd
 [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd
 [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd
 [<c1050b2b>] lock_acquire+0xba/0xdd
 [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
 [<c135d4fe>] down_write+0x2a/0x46
 [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
 [<d0d17d3f>] nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
 [<c104ea2c>] ? mark_held_locks+0x43/0x5b
 [<c104ecb1>] ? trace_hardirqs_on_caller+0x10b/0x133
 [<c104ece4>] ? trace_hardirqs_on+0xb/0xd
 [<d0d09ac1>] nilfs_put_super+0x2f/0xca [nilfs2]
 [<c10b3352>] generic_shutdown_super+0x49/0xb8
 [<c10b33de>] kill_block_super+0x1d/0x31
 [<c10e6599>] ? vfs_quota_off+0x0/0x12
 [<c10b398f>] deactivate_super+0x57/0x6c
 [<c10c4bc3>] mntput_no_expire+0x8c/0xb4
 [<c10c5094>] sys_umount+0x27f/0x2a4
 [<c10c50c6>] sys_oldumount+0xd/0xf
 [<c10031a4>] sysenter_do_call+0x12/0x38
 ...

This turns out to be a bug brought by an -rc1 patch ("nilfs2: simplify
remaining sget() use").

In the patch, a new "put resource" function, nilfs_put_sbinfo()
was introduced to delay freeing nilfs_sb_info struct.

But the nilfs_put_sbinfo() mistakenly used atomic_dec_and_test()
function to check the reference count, and it caused the nilfs_sb_info
was freed when user mounted a snapshot twice.

This bug also suggests there was unseen memory leak in usual mount
/umount operations for nilfs.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-08-19 02:10:13 +09:00
Zhang Qiang 1154ecbd2f nilfs2: missing a read lock for segment writer in nilfs_attach_checkpoint()
'ns_cno' of structure 'the_nilfs' must be protected from segment
writer, in other words, the caller of nilfs_get_checkpoint should hold
read lock for nilfs->ns_segctor_sem.  This patch adds the lock/unlock
operations in nilfs_attach_checkpoint() when calling
nilfs_cpfile_get_checkpoint().

Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-08-18 17:32:27 +09:00