Commit Graph

168 Commits

Author SHA1 Message Date
Al Viro 6b520e0565 vfs: fix the stupidity with i_dentry in inode destructors
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else.  Not to mention the removal of
boilerplate code from ->destroy_inode() instances...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:40 -05:00
Ryusuke Konishi aa405b1f42 nilfs2: always set back pointer to host inode in mapping->host
In the current nilfs, page cache for btree nodes and meta data files
do not set a valid back pointer to the host inode in mapping->host.

This will change it so that every address space in nilfs uses
mapping->host to hold its host inode.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-05-10 22:21:57 +09:00
Ryusuke Konishi 4e33f9eab0 nilfs2: implement resize ioctl
This adds resize ioctl which makes online resize possible.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-05-10 22:21:46 +09:00
Ryusuke Konishi cfb0a4bfd8 nilfs2: add routine to move secondary super block
After resizing the filesystem, the secondary super block must be moved
to a new location.  This adds a helper function for this.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-05-10 22:21:45 +09:00
Ryusuke Konishi e3154e9748 nilfs2: get rid of nilfs_sb_info structure
This directly uses sb->s_fs_info to keep a nilfs filesystem object and
fully removes the intermediate nilfs_sb_info structure.  With this
change, the hierarchy of on-memory structures of nilfs will be
simplified as follows:

Before:
  super_block
       -> nilfs_sb_info
             -> the_nilfs
                   -> cptree --+-> nilfs_root (current file system)
                               +-> nilfs_root (snapshot A)
                               +-> nilfs_root (snapshot B)
                               :
             -> nilfs_sc_info (log writer structure)
After:
  super_block
       -> the_nilfs
             -> cptree --+-> nilfs_root (current file system)
                         +-> nilfs_root (snapshot A)
                         +-> nilfs_root (snapshot B)
                         :
             -> nilfs_sc_info (log writer structure)

The reason why we didn't design so from the beginning is because the
initial shape also differed from the above.  The early hierachy was
composed of "per-mount-point" super_block -> nilfs_sb_info pairs and a
shared nilfs object.  On the kernel 2.6.37, it was changed to the
current shape in order to unify super block instances into one per
device, and this cleanup became applicable as the result.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-09 11:54:26 +09:00
Ryusuke Konishi f7545144c2 nilfs2: use sb instance instead of nilfs_sb_info struct
This replaces sbi uses with direct reference to sb instance.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-09 11:05:08 +09:00
Ryusuke Konishi 9b1fc4e497 nilfs2: move next generation counter into nilfs object
Moves s_next_generation counter and a spinlock protecting it to nilfs
object from nilfs_sb_info structure.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-09 11:05:08 +09:00
Ryusuke Konishi 693dd32122 nilfs2: move s_inode_lock and s_dirty_files into nilfs object
Moves s_inode_lock spinlock and s_dirty_files list to nilfs object
from nilfs_sb_info structure.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-09 11:05:07 +09:00
Ryusuke Konishi 574e6c3145 nilfs2: move parameters on nilfs_sb_info into nilfs object
This moves four parameter variables on nilfs_sb_info s_resuid,
s_resgid, s_interval and s_watermark to the nilfs object.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-09 11:05:07 +09:00
Ryusuke Konishi 3b2ce58b0f nilfs2: move mount options to nilfs object
This moves mount_opt local variable to nilfs object from nilfs_sb_info
struct.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-03-09 11:05:07 +09:00
Miklos Szeredi 2aa15890f3 mm: prevent concurrent unmap_mapping_range() on the same inode
Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"

Gurudas Pai reported the same bug on NFS.

The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode.  For example:

  thread1: going through a big range, stops in the middle of a vma and
     stores the restart address in vm_truncate_count.

  thread2: comes in with a small (e.g. single page) unmap request on
     the same vma, somewhere before restart_address, finds that the
     vma was already unmapped up to the restart address and happily
     returns without doing anything.

Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value.  This could go on forever without any of them being able to
finish.

Truncate and hole punching already serialize with i_mutex.  Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers.  In particular ->d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.

This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.

[ We'll hopefully get rid of all this with the upcoming mm
  preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
  lockbreak" patch in particular.  But that is for 2.6.39 ]

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reported-by: Michael Leun <lkml20101129@newton.leun.net>
Reported-by: Gurudas Pai <gurudas.pai@oracle.com>
Tested-by: Gurudas Pai <gurudas.pai@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-23 19:52:52 -08:00
Ryusuke Konishi 0ca7a5b9ac nilfs2: fix crash after one superblock became unavailable
Fixes the following kernel oops in nilfs_setup_super() which could
arise if one of two super-blocks is unavailable.

> BUG: unable to handle kernel NULL pointer dereference at   (null)
> Pid: 3529, comm: mount.nilfs2 Not tainted 2.6.37 #1 /
> EIP: 0060:[<c03196bc>] EFLAGS: 00010202 CPU: 3
> EIP is at memcpy+0xc/0x1b
> Call Trace:
>  [<f953720e>] ? nilfs_setup_super+0x6c/0xa5 [nilfs2]
>  [<f95369e9>] ? nilfs_get_root_dentry+0x81/0xcb [nilfs2]
>  [<f9537a08>] ? nilfs_mount+0x4f9/0x62c [nilfs2]
>  [<c02745cf>] ? kstrdup+0x36/0x3f
>  [<f953750f>] ? nilfs_mount+0x0/0x62c [nilfs2]
>  [<c0293940>] ? vfs_kern_mount+0x4d/0x12c
>  [<c02a5100>] ? get_fs_type+0x76/0x8f
>  [<c0293a68>] ? do_kern_mount+0x33/0xbf
>  [<c02a784a>] ? do_mount+0x2ed/0x714
>  [<c02a6171>] ? copy_mount_options+0x28/0xfc
>  [<c02a7ce3>] ? sys_mount+0x72/0xaf
>  [<c0473085>] ? syscall_call+0x7/0xb

Reported-by: Wakko Warner <wakko@animx.eu.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Wakko Warner <wakko@animx.eu.org>
Cc: stable <stable@kernel.org> [2.6.37, 2.6.36]
LKML-Reference: <20110121024918.GA29598@animx.eu.org>
2011-01-22 15:22:36 +09:00
Linus Torvalds 275220f0fc Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
  block: ensure that completion error gets properly traced
  blktrace: add missing probe argument to block_bio_complete
  block cfq: don't use atomic_t for cfq_group
  block cfq: don't use atomic_t for cfq_queue
  block: trace event block fix unassigned field
  block: add internal hd part table references
  block: fix accounting bug on cross partition merges
  kref: add kref_test_and_get
  bio-integrity: mark kintegrityd_wq highpri and CPU intensive
  block: make kblockd_workqueue smarter
  Revert "sd: implement sd_check_events()"
  block: Clean up exit_io_context() source code.
  Fix compile warnings due to missing removal of a 'ret' variable
  fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
  block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
  cfq-iosched: don't check cfqg in choose_service_tree()
  fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
  cdrom: export cdrom_check_events()
  sd: implement sd_check_events()
  sr: implement sr_check_events()
  ...
2011-01-13 10:45:01 -08:00
Alexey Dobriyan 57cc7215b7 headers: kobject.h redux
Remove kobject.h from files which don't need it, notably,
sched.h and fs.h.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-10 08:51:44 -08:00
Ryusuke Konishi 06df0f9992 nilfs2: get rid of nilfs_mount_options structure
Only mount_opt member is used in the nilfs_mount_options structure,
and we can simplify it.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10 14:05:46 +09:00
Joe Perches b004a5eb0b fs/nilfs2/super.c: Use printf extension %pV
Using %pV reduces the number of printk calls and
eliminates any possible message interleaving from
other printk calls.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-01-10 14:05:45 +09:00
Nick Piggin fa0d7e3de6 fs: icache RCU free inodes
RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:26 +11:00
Nick Piggin b7ab39f631 fs: dcache scale dentry refcount
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:21 +11:00
Tejun Heo d4d7762995 block: clean up blkdev_get() wrappers and their users
After recent blkdev_get() modifications, open_by_devnum() and
open_bdev_exclusive() are simple wrappers around blkdev_get().
Replace them with blkdev_get_by_dev() and blkdev_get_by_path().

blkdev_get_by_dev() is identical to open_by_devnum().
blkdev_get_by_path() is slightly different in that it doesn't
automatically add %FMODE_EXCL to @mode.

All users are converted.  Most conversions are mechanical and don't
introduce any behavior difference.  There are several exceptions.

* btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
  reason to OR it explicitly on blkdev_put().

* gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
  sb->s_mode.

* With the above changes, sb->s_mode now always should contain
  FMODE_EXCL.  WARN_ON_ONCE() added to kill_block_super() to detect
  errors.

The new blkdev_get_*() functions are with proper docbook comments.
While at it, add function description to blkdev_get() too.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Joern Engel <joern@lazybastard.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: reiserfs-devel@vger.kernel.org
Cc: xfs-masters@oss.sgi.com
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
2010-11-13 11:55:18 +01:00
Tejun Heo e525fd89d3 block: make blkdev_get/put() handle exclusive access
Over time, block layer has accumulated a set of APIs dealing with bdev
open, close, claim and release.

* blkdev_get/put() are the primary open and close functions.

* bd_claim/release() deal with exclusive open.

* open/close_bdev_exclusive() are combination of open and claim and
  the other way around, respectively.

* bd_link/unlink_disk_holder() to create and remove holder/slave
  symlinks.

* open_by_devnum() wraps bdget() + blkdev_get().

The interface is a bit confusing and the decoupling of open and claim
makes it impossible to properly guarantee exclusive access as
in-kernel open + claim sequence can disturb the existing exclusive
open even before the block layer knows the current open if for another
exclusive access.  Reorganize the interface such that,

* blkdev_get() is extended to include exclusive access management.
  @holder argument is added and, if is @FMODE_EXCL specified, it will
  gain exclusive access atomically w.r.t. other exclusive accesses.

* blkdev_put() is similarly extended.  It now takes @mode argument and
  if @FMODE_EXCL is set, it releases an exclusive access.  Also, when
  the last exclusive claim is released, the holder/slave symlinks are
  removed automatically.

* bd_claim/release() and close_bdev_exclusive() are no longer
  necessary and either made static or removed.

* bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
  is no longer necessary and removed.

* open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
  and blkdev_get().  It also has an unexpected extra bdev_read_only()
  test which probably should be moved into blkdev_get().

* open_by_devnum() is modified to take @holder argument and pass it to
  blkdev_get().

Most of bdev open/close operations are unified into blkdev_get/put()
and most exclusive accesses are tested atomically at the open time (as
it should).  This cleans up code and removes some, both valid and
invalid, but unnecessary all the same, corner cases.

open_bdev_exclusive() and open_by_devnum() can use further cleanup -
rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
special features.  Well, let's leave them for another day.

Most conversions are straight-forward.  drbd conversion is a bit more
involved as there was some reordering, but the logic should stay the
same.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Brown <neilb@suse.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: dm-devel@redhat.com
Cc: drbd-dev@lists.linbit.com
Cc: Leo Chen <leochen@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Joern Engel <joern@logfs.org>
Cc: reiserfs-devel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
2010-11-13 11:55:17 +01:00
Al Viro e4c59d61e8 convert nilfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-29 04:16:53 -04:00
Linus Torvalds ab34c02afe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (36 commits)
  nilfs2: eliminate sparse warning - "context imbalance"
  nilfs2: eliminate sparse warnings - "symbol not declared"
  nilfs2: get rid of bdi from nilfs object
  nilfs2: change license of exported header file
  nilfs2: add bdev freeze/thaw support
  nilfs2: accept 64-bit checkpoint numbers in cp mount option
  nilfs2: remove own inode allocator and destructor for metadata files
  nilfs2: get rid of back pointer to writable sb instance
  nilfs2: get rid of mi_nilfs back pointer to nilfs object
  nilfs2: see state of root dentry for mount check of snapshots
  nilfs2: use iget for all metadata files
  nilfs2: get rid of GCDAT inode
  nilfs2: add routines to redirect access to buffers of DAT file
  nilfs2: add routines to roll back state of DAT file
  nilfs2: add routines to save and restore bmap state
  nilfs2: do not allocate nilfs_mdt_info structure to gc-inodes
  nilfs2: allow nilfs_clear_inode to clear metadata file inodes
  nilfs2: get rid of snapshot mount flag
  nilfs2: simplify life cycle management of nilfs object
  nilfs2: do not allocate multiple super block instances for a device
  ...
2010-10-23 01:26:47 -07:00
Jiro SEKIBA abc0b50b6b nilfs2: eliminate sparse warnings - "symbol not declared"
change nilfs_dat_commit_free and nilfs_inode_cachep static
to fix following warnings

fs/nilfs2/super.c:72:19: warning: symbol 'nilfs_inode_cachep' was not declared. Should it be static?
fs/nilfs2/dat.c:106:6: warning: symbol 'nilfs_dat_commit_free' was not declared. Should it be static?

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:40 +09:00
Ryusuke Konishi 026a7d63d5 nilfs2: get rid of bdi from nilfs object
Nilfs now can use sb->s_bdi to get backing_dev_info, so we use it
instead of ns_bdi on the nilfs object and remove ns_bdi.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:39 +09:00
Ryusuke Konishi 5beb6e0b20 nilfs2: add bdev freeze/thaw support
Nilfs hasn't supported the freeze/thaw feature because it didn't work
due to the peculiar design that multiple super block instances could
be allocated for a device.  This limitation was removed by the patch
"nilfs2: do not allocate multiple super block instances for a device".

So now this adds the freeze/thaw support to nilfs.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:39 +09:00
Ryusuke Konishi c05dbfc260 nilfs2: accept 64-bit checkpoint numbers in cp mount option
The current implementation doesn't mount snapshots with checkpoint
numbers larger than INT_MAX since it uses match_int() for parsing
"cp=" mount option.

This uses simple_strtoull() for the conversion to resolve the issue.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:39 +09:00
Ryusuke Konishi 2879ed66e4 nilfs2: remove own inode allocator and destructor for metadata files
This finally removes own inode allocator and destructor functions for
metadata files.  Several routines, nilfs_mdt_new(),
nilfs_mdt_new_common(), nilfs_mdt_clear(), nilfs_mdt_destroy(), and
nilfs_alloc_inode_common() will be gone.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:39 +09:00
Ryusuke Konishi 032dbb3b50 nilfs2: see state of root dentry for mount check of snapshots
After applied the patch that unified sb instances, root dentry of
snapshots can be left in dcache even after their trees are unmounted.

The orphan root dentry/inode keeps a root object, and this causes
false positive of nilfs_checkpoint_is_mounted function.

This resolves the issue by having nilfs_checkpoint_is_mounted test
whether the root dentry is busy or not.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:38 +09:00
Ryusuke Konishi f1e89c86fd nilfs2: use iget for all metadata files
This makes use of iget5_locked to allocate or get inode for metadata
files to stop using own inode allocator.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:38 +09:00
Ryusuke Konishi 348fe8da13 nilfs2: simplify life cycle management of nilfs object
This stops pre-allocating nilfs object in nilfs_get_sb routine, and
stops managing its life cycle by reference counting.

nilfs_find_or_create_nilfs() function, nilfs->ns_mount_mutex,
nilfs_objects list, and the reference counter will be removed through
the simplification.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:36 +09:00
Ryusuke Konishi f11459ad7d nilfs2: do not allocate multiple super block instances for a device
This stops allocating multiple super block instances for a device.

All snapshots and a current mode mount (i.e. latest tree) will be
controlled with nilfs_root objects that are kept within an sb
instance.

nilfs_get_sb() is rewritten so that it always has a root object for
the latest tree and snapshots make additional root objects.

The root dentry of the latest tree is binded to sb->s_root even if it
isn't attached on a directory.  Root dentries of snapshots or the
latest tree are binded to mnt->mnt_root on which they are mounted.

With this patch, nilfs_find_sbinfo() function, nilfs->ns_supers list,
and nilfs->ns_current back pointer, are deleted.  In addition,
init_nilfs() and load_nilfs() are simplified since they will be called
once for a device, not repeatedly called for mount points.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:36 +09:00
Ryusuke Konishi ab4d8f7ebf nilfs2: split out nilfs_attach_snapshot
This splits the code to attach snapshots into a separate routine for
convenience sake.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:36 +09:00
Ryusuke Konishi 367ea33486 nilfs2: split out nilfs_get_root_dentry
This splits the code to allocate root dentry into a separate routine
for convenience in successive changes.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:35 +09:00
Ryusuke Konishi b7c0634204 nilfs2: move inode count and block count into root object
This moves sbi->s_inodes_count and sbi->s_blocks_count into nilfs_root
object.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:35 +09:00
Ryusuke Konishi e912a5b668 nilfs2: use root object to get ifile
This rewrites functions using ifile so that they get ifile from
nilfs_root object, and will remove sbi->s_ifile.  Some functions that
don't know the root object are extended to receive it from caller.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:35 +09:00
Ryusuke Konishi 8e656fd518 nilfs2: make snapshots in checkpoint tree exportable
The previous export operations cannot handle multiple versions of
a filesystem if they belong to the same sb instance.

This adds a new type of file handle and extends export operations so
that they can get the inode specified by a checkpoint number as well
as an inode number and a generation number.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:34 +09:00
Ryusuke Konishi 4d8d9293dc nilfs2: set pointer to root object in inodes
This puts a pointer to nilfs_root object in the private part of
on-memory inode, and makes nilfs_iget function pick up the inode with
the same root object.

Non-root inodes inherit its nilfs_root object from parent inode.  That
of the root inode is allocated through nilfs_attach_checkpoint()
function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:34 +09:00
Ryusuke Konishi 0e14a3595b nilfs2: use iget5_locked to get inode
This uses iget5_locked instead of iget_locked so that gc cache can
look up inodes with an inode number and an optional checkpoint number.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:33 +09:00
Ryusuke Konishi b91c9a97c9 nilfs2: allow nilfs_destroy_inode to destroy metadata file inodes
The current nilfs_destroy_inode() doesn't handle metadata file inodes
including gc inodes (dummy inodes used for garbage collection).

This allows nilfs_destroy_inode() to destroy inodes of metadata files.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:33 +09:00
Linus Torvalds a2887097f2 Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
  xen-blkfront: disable barrier/flush write support
  Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
  block: remove BLKDEV_IFL_WAIT
  aic7xxx_old: removed unused 'req' variable
  block: remove the BH_Eopnotsupp flag
  block: remove the BLKDEV_IFL_BARRIER flag
  block: remove the WRITE_BARRIER flag
  swap: do not send discards as barriers
  fat: do not send discards as barriers
  ext4: do not send discards as barriers
  jbd2: replace barriers with explicit flush / FUA usage
  jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
  jbd: replace barriers with explicit flush / FUA usage
  nilfs2: replace barriers with explicit flush / FUA usage
  reiserfs: replace barriers with explicit flush / FUA usage
  gfs2: replace barriers with explicit flush / FUA usage
  btrfs: replace barriers with explicit flush / FUA usage
  xfs: replace barriers with explicit flush / FUA usage
  block: pass gfp_mask and flags to sb_issue_discard
  dm: convey that all flushes are processed as empty
  ...
2010-10-22 17:07:18 -07:00
Jan Blunck d6d4c19c5f BKL: Remove BKL from NILFS2
The BKL is only used in put_super, fill_super and remount_fs that are all
three protected by the superblocks s_umount rw_semaphore. Therefore it is
safe to remove the BKL entirely.

Signed-off-by: Jan Blunck <jblunck@infradead.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2010-10-04 21:10:41 +02:00
Jan Blunck db71922217 BKL: Explicitly add BKL around get_sb/fill_super
This patch is a preparation necessary to remove the BKL from do_new_mount().
It explicitly adds calls to lock_kernel()/unlock_kernel() around
get_sb/fill_super operations for filesystems that still uses the BKL.

I've read through all the code formerly covered by the BKL inside
do_kern_mount() and have satisfied myself that it doesn't need the BKL
any more.

do_kern_mount() is already called without the BKL when mounting the rootfs
and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
afs_mntpt_follow_link(). Both later functions are actually the filesystems
follow_link inode operation. vfs_kern_mount() is calling the specified
get_sb function and lets the filesystem do its job by calling the given
fill_super function.

Therefore I think it is safe to push down the BKL from the VFS to the
low-level filesystems get_sb/fill_super operation.

[arnd: do not add the BKL to those file systems that already
       don't use it elsewhere]

Signed-off-by: Jan Blunck <jblunck@infradead.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Christoph Hellwig <hch@infradead.org>
2010-10-04 21:10:10 +02:00
Christoph Hellwig f8c131f5b6 nilfs2: replace barriers with explicit flush / FUA usage
Switch to the WRITE_FLUSH_FUA flag for log writes, remove the EOPNOTSUPP
detection for barriers and stop setting the barrier flag for discards.

tj: nilfs is now fixed to wait for discard completion.  Updated this
    patch accordingly and dropped warning about it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:35:39 +02:00
Linus Torvalds 145c3ae46b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  fs: brlock vfsmount_lock
  fs: scale files_lock
  lglock: introduce special lglock and brlock spin locks
  tty: fix fu_list abuse
  fs: cleanup files_lock locking
  fs: remove extra lookup in __lookup_hash
  fs: fs_struct rwlock to spinlock
  apparmor: use task path helpers
  fs: dentry allocation consolidation
  fs: fix do_lookup false negative
  mbcache: Limit the maximum number of cache entries
  hostfs ->follow_link() braino
  hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
  remove SWRITE* I/O types
  kill BH_Ordered flag
  vfs: update ctime when changing the file's permission by setfacl
  cramfs: only unlock new inodes
  fix reiserfs_evict_inode end_writeback second call
2010-08-18 09:35:08 -07:00
Christoph Hellwig 87e99511ea kill BH_Ordered flag
Instead of abusing a buffer_head flag just add a variant of
sync_dirty_buffer which allows passing the exact type of write
flag required.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 01:09:00 -04:00
Ryusuke Konishi af4e36318e nilfs2: fix list corruption after ifile creation failure
If nilfs_attach_checkpoint() gets a memory allocation failure during
creation of ifile, it will return without removing nilfs_sb_info
struct from ns_supers list.  When a concurrently mounted snapshot is
unmounted or another new snapshot is mounted after that, this causes
kernel oops as below:

> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<f83662ff>] nilfs_find_sbinfo+0x74/0xa4 [nilfs2]
> *pde = 00000000
> Oops: 0000 [#1] SMP
<snip>
> Call Trace:
>  [<f835dc29>] ? nilfs_get_sb+0x165/0x532 [nilfs2]
>  [<c1173c87>] ? ida_get_new_above+0x16d/0x187
>  [<c109a7f8>] ? alloc_vfsmnt+0x7e/0x10a
>  [<c1070790>] ? kstrdup+0x2c/0x40
>  [<c1089041>] ? vfs_kern_mount+0x96/0x14e
>  [<c108913d>] ? do_kern_mount+0x32/0xbd
>  [<c109b331>] ? do_mount+0x642/0x6a1
>  [<c101a415>] ? do_page_fault+0x0/0x2d1
>  [<c1099c00>] ? copy_mount_options+0x80/0xe2
>  [<c10705d8>] ? strndup_user+0x48/0x67
>  [<c109b3f1>] ? sys_mount+0x61/0x90
>  [<c10027cc>] ? sysenter_do_call+0x12/0x22

This fixes the problem.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable@kernel.org
2010-08-16 11:08:36 +09:00
Linus Torvalds 5f248c9c25 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
  no need for list_for_each_entry_safe()/resetting with superblock list
  Fix sget() race with failing mount
  vfs: don't hold s_umount over close_bdev_exclusive() call
  sysv: do not mark superblock dirty on remount
  sysv: do not mark superblock dirty on mount
  btrfs: remove junk sb_dirt change
  BFS: clean up the superblock usage
  AFFS: wait for sb synchronization when needed
  AFFS: clean up dirty flag usage
  cifs: truncate fallout
  mbcache: fix shrinker function return value
  mbcache: Remove unused features
  add f_flags to struct statfs(64)
  pass a struct path to vfs_statfs
  update VFS documentation for method changes.
  All filesystems that need invalidate_inode_buffers() are doing that explicitly
  convert remaining ->clear_inode() to ->evict_inode()
  Make ->drop_inode() just return whether inode needs to be dropped
  fs/inode.c:clear_inode() is gone
  fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
  ...

Fix up trivial conflicts in fs/nilfs2/super.c
2010-08-10 11:26:52 -07:00
Al Viro 6fd1e5c994 convert nilfs2 to ->evict_inode()
[folded build fix from sfr]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:25 -04:00
Ryusuke Konishi c5ca48aabe nilfs2: reject incompatible filesystem
This forces nilfs to check compatibility of feature flags so as to
reject a filesystem with unknown features when it mounts or remounts
the filesystem.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:16 +09:00
Ryusuke Konishi 05d0e94b66 nilfs2: get rid of nilfs_bmap_union
This removes nilfs_bmap_union and finally unifies three structures and
the union in bmap/btree code into one.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:14 +09:00
Ryusuke Konishi 7c01745781 nilfs2: pass remount flag to parse_options
This adds is_remount argument to the parse_options() function that
obtains mount options from strings.

Previously, parse_options did not distinguish context whether it's
called for a new mount or remount, so the caller needed additional
verifications outside the function.

This allows parse_options to verify options and print messages
depending on the context.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:12 +09:00
Ryusuke Konishi c6b4d57ddf nilfs2: use seq_puts to print mount options without argument
This replaces seq_printf() with seq_puts() in nilfs_show_options for
mount options which have no argument.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:12 +09:00
Ryusuke Konishi 802d317754 nilfs2: add nodiscard mount option
Nilfs has "discard" mount option which issues discard/TRIM commands to
underlying block device, but it lacks a complementary option and has
no way to disable the feature through remount.

This adds "nodiscard" option to resolve this imbalance.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:12 +09:00
Ryusuke Konishi 773bc4f3b6 nilfs2: add barrier mount option
Nilfs enables write barriers by default and has "nobarrier" mount
option to disable this feature.  But it lacks the complementary option
and has no way to re-enable the feature on remount.

This adds "barrier" option to resolve this imbalance.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:12 +09:00
Jiro SEKIBA b2ac86e1a8 nilfs2: sync super blocks in turns
This will sync super blocks in turns instead of syncing duplicate
super blocks at the time.  This will help searching valid super root
when super block is written into disk before log is written, which is
happen when barrier-less block devices are unmounted uncleanly.  In
the situation, old super block likely points to valid log.

This patch introduces ns_sbwcount member to the nilfs object and adds
nilfs_sb_will_flip() function; ns_sbwcount counts how many times super
blocks write back to the disk.  And, nilfs_sb_will_flip() decides
whether flipping required or not based on the count of ns_sbwcount to
sync super blocks asymmetrically.

The following functions are also changed:

 - nilfs_prepare_super(): flips super blocks according to the
   argument.  The argument is calculated by nilfs_sb_will_flip()
   function.

 - nilfs_cleanup_super(): sets "clean" flag to both super blocks if
   they point to the same checkpoint.

To update both of super block information, caller of
nilfs_commit_super must set the information on both super blocks.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:11 +09:00
Jiro SEKIBA d26493b6f0 nilfs2: introduce nilfs_prepare_super
This function checks validity of super block pointers.
If first super block is invalid, it will swap the super blocks.
The function should be called before any super block information updates.
Caller must obtain nilfs->ns_sem.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:10 +09:00
Ryusuke Konishi 60f46b7efc nilfs2: separate function that updates log position
This moves out section that updates information of the recent log
position stored in super blocks from nilfs_commit_super to a new
routine named nilfs_set_log_cursor.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:10 +09:00
Ryusuke Konishi c8a11c8a14 nilfs2: add nilfs_set_error
This function marks error state and write it on super blocks.  This is
a preparation for making super block writeback alternately.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:10 +09:00
Ryusuke Konishi 7ecaa46cfe nilfs2: add nilfs_cleanup_super
This function write out filesystem state to super blocks in order to
share the same cleanup work.  This is a preparation for making super
block writeback alternately.

Cc: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:10 +09:00
Ryusuke Konishi bde4e696e4 nilfs2: do not update mount time on rw->ro remount
Mount time field in super block is wrongly updated when nilfs remounts
the partition from read-write to read-only.  This fixes the issue.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:10 +09:00
Ryusuke Konishi 84cb099985 nilfs2: fix style issue in nilfs_destroy_cachep
This gets rid of unwanted space chars in front of conditional
sentences of nilfs_destroy_cachep().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-31 20:50:29 +09:00
Ryusuke Konishi d240e06713 nilfs2: disallow remount of snapshot from/to a regular mount
Snapshots and regular ro/rw mounts are essentially-different within
the meaning whether the checkpoint is static or not and is marked with
a snapshot flag or not.

The current implemenation, however, allows to remount a snapshot to a
regular rw-mount if the checkpoint number equals the latest one.

This transition is actually impossible since changing a checkpoint to
a snapshot makes another checkpoint, thus the condition is never
satisfied.

This fixes the weird state of affairs, and specifically separates
snapshots and regular rw/ro-mounts.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:34 +09:00
Ryusuke Konishi b87ca91948 nilfs2: update comment on deactivate_super at nilfs_get_sb
deactivate_super was replaced with deactivate_locked_super, but the
comment of nilfs_get_sb remain unchanged.  This renews the comment.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:34 +09:00
Ryusuke Konishi e2d1591a13 nilfs2: replace MS_VERBOSE with MS_SILENT
MS_VERBOSE is deprecated.  This replaces it with MS_SILENT in
reference to get_sb_bdev function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:34 +09:00
Ryusuke Konishi 4571b82cdc nilfs2: add missing initialization of s_mode
An fmode_t argument is passed to kill_block_super() through s_mode
member of the super_block structure.  This is used to release the
block device with the same mode, however, nilfs does not set s_mode
anywhere.

This modifies nilfs_get_sb function to properly initialize the s_mode
member.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:33 +09:00
Ryusuke Konishi 13e905592b nilfs2: fix misuse of open_bdev_exclusive/close_bdev_exclusive
The second argument of open_bdev_exclusive/close_bdev_exclusive takes
fmode_t flags instead of mount flags.  This fixes the misuse.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:33 +09:00
Ryusuke Konishi 34cb9b5c97 nilfs2: add missing endian conversion on super block magic number
This adds missing endian conversions in comparision of the magic
number of super blocks.  It was coincidence that prior versions didn't
incur problems; the upper byte of the magic number happened to be
equal to the lower byte.  But, semantically it's wrong to depend on
this.

This won't change anything else nor suffer any compatibility issues.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:32 +09:00
Li Hong 9f130263f3 nilfs2: add a print message after loading nilfs2
Printing a message after loading a file system is a practice. Add this to
provide a better user-friendly experience.

Signed-off-by: Li Hong <lihong.hi@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:30 +09:00
Li Hong 41c88bd74d nilfs2: cleanup multi kmem_cache_{create,destroy} code
This cleanup patch gives several improvements:

 - Moving all kmem_cache_{create_destroy} calls into one place, which removes
 some small function calls, cleans up error check code and clarify the logic.

 - Mark all initial code in __init section.

 - Remove some very obvious comments.

 - Adjust some declarations.

 - Fix some space-tab issues.

Signed-off-by: Li Hong <lihong.hi@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:30 +09:00
Ryusuke Konishi 277a6a3417 nilfs2: change default of 'errors' mount option to 'remount-ro' mode
Like ext3, nilfs has 'errors' mount option to allow specifying desired
behavior on severe errors.

Currently, the default action is 'errors=continue' and has potential
to advance filesystem corruption for severe errors.

This will change the action to 'errors=remount-ro' to avoid the issue.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:30 +09:00
Ryusuke Konishi 973bec34bf nilfs2: fix sync silent failure
As of 32a88aa1, __sync_filesystem() will return 0 if s_bdi is not set.
And nilfs does not set s_bdi anywhere.  I noticed this problem by the
warning introduced by the recent commit 5129a469 ("Catch filesystem
lacking s_bdi").

 WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e()
 Hardware name: PowerEdge 2850
 Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas
 Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38
 Call Trace:
  [<c1028422>] warn_slowpath_common+0x60/0x90
  [<c102845f>] warn_slowpath_null+0xd/0x10
  [<c1095936>] vfs_kern_mount+0xc5/0x14e
  [<c1095a03>] do_kern_mount+0x32/0xbd
  [<c10a811e>] do_mount+0x671/0x6d0
  [<c1073794>] ? __get_free_pages+0x1f/0x21
  [<c10a684f>] ? copy_mount_options+0x2b/0xe2
  [<c107b634>] ? strndup_user+0x48/0x67
  [<c10a81de>] sys_mount+0x61/0x8f
  [<c100280c>] sysenter_do_call+0x12/0x32

This ensures to set s_bdi for nilfs and fixes the sync silent failure.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-03 07:36:01 -07:00
Ryusuke Konishi c91cea11df nilfs2: remove whitespaces before quoted newlines
This kills the following checkpatch warnings:

 WARNING: unnecessary whitespace before a quoted newline
 #869: FILE: super.c:869:
 +     	           "remount to a different snapshot. \n",

 WARNING: unnecessary whitespace before a quoted newline
 #389: FILE: the_nilfs.c:389:
 +     	    printk(KERN_ERR "NILFS: too short segment. \n");

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-14 10:29:51 +09:00
Ryusuke Konishi 7a65004bba nilfs2: fix various typos in comments
This fixes various typos I found in comments of nilfs2.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-14 10:29:51 +09:00
Ryusuke Konishi e605f0a724 nilfs2: get rid of s_dirt flag use
This replaces s_dirt flag use in nilfs with a new flag added on the
nilfs object.  The s_dirt flag was used to indicate if
sop->write_super() should be called, however the current version of
nilfs does not use the callback.  Thus, it can be replaced with the
own flag.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Jiro SEKIBA <jir@unicus.jp>
2010-02-13 12:26:03 +09:00
Ryusuke Konishi fe5f171bb2 nilfs2: fix potential hang in nilfs_error on errors=remount-ro
nilfs_error() calls nilfs_detach_segment_constructor() if
errors=remount-ro option is specified, and this may lead to a hang due
to recursive locking of, for instance, nilfs->ns_segctor_sem and
others.

In this case, detaching segment constructor is not necessary because
read-only flag is set to the filesystem and further writes are
blocked.

This fixes the potential hang issue by removing the
nilfs_detach_segment_constructor() call from nilfs_error.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-13 12:26:03 +09:00
Jiro SEKIBA e902ec9906 nilfs2: issue discard request after cleaning segments
This adds a function to send discard requests for given array of
segment numbers, and calls the function when garbage collection
succeeded.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-13 12:26:02 +09:00
Al Viro a95161aaa8 switch nilfs2 to deactivate_locked_super()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:42 -05:00
Ryusuke Konishi 0234576d04 nilfs2: add norecovery mount option
This adds "norecovery" mount option which disables temporal write
access to read-only mounts or snapshots during mount/recovery.
Without this option, write access will be even performed for those
types of mounts; the temporal write access is needed to mount root
file system read-only after an unclean shutdown.

This option will be helpful when user wants to prevent any write
access to the device.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Eric Sandeen <sandeen@redhat.com>
2009-11-20 10:05:52 +09:00
Ryusuke Konishi f50a4c8149 nilfs2: move recovery completion into load_nilfs function
Although mount recovery of nilfs is integrated in load_nilfs()
procedure, the completion of recovery was isolated from the procedure
and performed at the end of the fill_super routine.

This was somewhat confusing since the recovery is needed for the nilfs
object, not for a super block instance.

To resolve the inconsistency, this will integrate the recovery
completion into load_nilfs().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:52 +09:00
Ryusuke Konishi fd66c0d5c3 nilfs2: hide nilfs_mdt_clear calls in nilfs_mdt_destroy
This will hide a function call of nilfs_mdt_clear() in
nilfs_mdt_destroy().

This ensures nilfs_mdt_destroy() to do cleanup jobs included in
nilfs_mdt_clear().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:48 +09:00
Ryusuke Konishi 79739565e1 nilfs2: separate constructor of metadata files
This will displace nilfs_mdt_new() constructor with individual
metadata file constructors like nilfs_dat_new(), new_sufile_new(),
nilfs_cpfile_new(), and nilfs_ifile_new().

This makes it possible for each metadata file to have own
intialization code.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:48 +09:00
Ryusuke Konishi 5731e191f2 nilfs2: add size option of private object to metadata file allocator
This adds an optional "object size" argument to nilfs_mdt_new_common()
function; the argument specifies the size of private object attached
to a newly allocated metadata file inode.

This will afford space to keep local variables for meta data files.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:48 +09:00
Jiro SEKIBA e2073e7857 nilfs2: cleanup unused match_bool function
match_bool function is not used anymore.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:47 +09:00
Jiro SEKIBA 91f1953bf3 nilfs2: Using nobarrier option instead of barrier=off
Since most of fs using nofoobar style option,
modified barrier=off option as nobarrier.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:47 +09:00
Alexey Dobriyan ac4cfdd6d1 const: mark remaining export_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:24 -07:00
Alexey Dobriyan b87221de6a const: mark remaining super_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:24 -07:00
Ryusuke Konishi 7a102b0923 nilfs2: remove individual gfp constants for each metadata file
This gets rid of NILFS_CPFILE_GFP, NILFS_SUFILE_GFP, NILFS_DAT_GFP,
and NILFS_IFILE_GFP.  All of these constants refer to NILFS_MDT_GFP,
and can be removed.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:15 +09:00
Zhu Yanhai 43be0ec038 nilfs2: add more check routines in mount process
nilfs2: Add more safeguard routines and protections in mount process,
which also makes nilfs2 report consistency error messages when
checkpoint number is invalid.

Signed-off-by: Zhu Yanhai <zhu.yanhai@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Zhang Qiang a4f0b9c5b4 nilfs2: An unassigned variable is assigned to a never used structure member
nilfs2: In procedure 'nilfs_get_sb()', when a nilfs filesysttem is
mounted for the first time, local variable 'nilfs->ns_last_cno' is
used before loading the latest checkpoint number from disk (in
'nilfs_fill_super'). 'nilfs->ns_last_cno' is assigned to 'sd.cno', but
'sd.cno' has never been used in the procedure.

Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Jiro SEKIBA 1dfa27105a nilfs2: stop using periodic write_super callback
This removes nilfs_write_super and commit super block in nilfs
internal thread, instead of periodic write_super callback.

VFS layer calls ->write_super callback periodically.  However,
it looks like that calling back is ommited when disk I/O is busy.
And when cleanerd (nilfs GC) is runnig, disk I/O tend to be busy thus
nilfs superblock is not synchronized as nilfs designed.

To avoid it, syncing superblock by nilfs thread instead of pdflush.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Jiro SEKIBA 79efdd9411 nilfs2: clean up nilfs_write_super
Separate conditions that check if syncing super block and alternative
super block are required as inline functions to reuse the conditions.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Jiro SEKIBA 6233caa9d5 nilfs2: fix disorder of nilfs_write_super in nilfs_sync_fs
This fixes disorder of nilfs_write_super in nilfs_sync_fs.  Commiting
super block must be the end of the function so that every changes are
reflected.

->sync_fs() is not called frequently so this makes nilfs_sync_fs call
nilfs_commit_super instead of nilfs_write_super.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:14 +09:00
Jiro SEKIBA ec5d66abdb nilfs2: remove redundant super block commit
This removes redundant super block commit.

nilfs_write_super will call nilfs_commit_super to store super block
into block device.  However, nilfs_put_super will call
nilfs_commit_super right after calling nilfs_write_super.  So calling
nilfs_write_super in nilfs_put_super would be redundant.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:13 +09:00
Jiro SEKIBA b58a285ba4 nilfs2: implement nilfs_show_options to display mount options in /proc/mounts
This is a patch to display mount options in procfs.
Mount options will show up in the /proc/mounts as other fs does.

...
/dev/sda6 /mnt nilfs2 ro,relatime,barrier=off,cp=3,order=strict 0 0
...

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14 18:27:13 +09:00
Zhang Qiang 1154ecbd2f nilfs2: missing a read lock for segment writer in nilfs_attach_checkpoint()
'ns_cno' of structure 'the_nilfs' must be protected from segment
writer, in other words, the caller of nilfs_get_checkpoint should hold
read lock for nilfs->ns_segctor_sem.  This patch adds the lock/unlock
operations in nilfs_attach_checkpoint() when calling
nilfs_cpfile_get_checkpoint().

Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-08-18 17:32:27 +09:00
Al Viro d441b1c293 switch nilfs2 to inode->i_acl
Actually, get rid of private analog, since nothing in there is
using ACLs at all so far.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:17:05 -04:00
Linus Torvalds 9c7cb99a82 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (22 commits)
  nilfs2: support contiguous lookup of blocks
  nilfs2: add sync_page method to page caches of meta data
  nilfs2: use device's backing_dev_info for btree node caches
  nilfs2: return EBUSY against delete request on snapshot
  nilfs2: modify list of unsupported features in caveats
  nilfs2: enable sync_page method
  nilfs2: set bio unplug flag for the last bio in segment
  nilfs2: allow future expansion of metadata read out via get info ioctl
  NILFS2: Pagecache usage optimization on NILFS2
  nilfs2: remove nilfs_btree_operations from btree mapping
  nilfs2: remove nilfs_direct_operations from direct mapping
  nilfs2: remove bmap pointer operations
  nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct
  nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function
  nilfs2: move get block functions in bmap.c into btree codes
  nilfs2: remove nilfs_bmap_delete_block
  nilfs2: remove nilfs_bmap_put_block
  nilfs2: remove header file for segment list operations
  nilfs2: eliminate removal list of segments
  nilfs2: add sufile function that can modify multiple segment usages
  ...
2009-06-15 09:13:49 -07:00
Ryusuke Konishi aa7dfb8954 nilfs2: get rid of bd_mount_sem use from nilfs
This will remove every bd_mount_sem use in nilfs.

The intended exclusion control was replaced by the previous patch
("nilfs2: correct exclusion control in nilfs_remount function") for
nilfs_remount(), and this patch will replace remains with a new mutex
that this inserts in nilfs object.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:18 -04:00
Ryusuke Konishi e59399d010 nilfs2: correct exclusion control in nilfs_remount function
nilfs_remount() changes mount state of a superblock instance.  Even
though nilfs accesses other superblock instances during mount or
remount, the mount state was not properly protected in
nilfs_remount().

Moreover, nilfs_remount() has a lock order reversal problem;
nilfs_get_sb() holds:

  1. bdev->bd_mount_sem
  2. sb->s_umount  (sget acquires)

and nilfs_remount() holds:

  1. sb->s_umount  (locked by the caller in vfs)
  2. bdev->bd_mount_sem

To avoid these problems, this patch divides a semaphore protecting
super block instances from nilfs->ns_sem, and applies it to the mount
state protection in nilfs_remount().

With this change, bd_mount_sem use is removed from nilfs_remount() and
the lock order reversal will be resolved.  And the new rw-semaphore,
nilfs->ns_super_sem will properly protect the mount state except the
modification from nilfs_error function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:18 -04:00
Ryusuke Konishi 6dd4740662 nilfs2: simplify remaining sget() use
This simplifies the test function passed on the remaining sget()
callsite in nilfs.

Instead of checking mount type (i.e. ro-mount/rw-mount/snapshot mount)
in the test function passed to sget(), this patch first looks up the
nilfs_sb_info struct which the given mount type matches, and then
acquires the super block instance holding the nilfs_sb_info.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:18 -04:00