The rw parameter to direct_IO is redundant with iov_iter->type, and
treated slightly differently just about everywhere it's used: some users
do rw & WRITE, and others do rw == WRITE where they should be doing a
bitwise check. Simplify this with the new iov_iter_rw() helper, which
always returns either READ or WRITE.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Most filesystems call through to these at some point, so we'll start
here.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reclaim will be leaving shadow entries in the page cache radix tree upon
evicting the real page. As those pages are found from the LRU, an
iput() can lead to the inode being freed concurrently. At this point,
reclaim must no longer install shadow pages because the inode freeing
code needs to ensure the page tree is really empty.
Add an address_space flag, AS_EXITING, that the inode freeing code sets
under the tree lock before doing the final truncate. Reclaim will check
for this flag before installing shadow pages.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
truncate_pagecache() doesn't care about old size since commit
cedabed49b ("vfs: Fix vmtruncate() regression"). Let's drop it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes races uncovered by xfstests testcase 068.
One race is the result of jfs_sync() trying to write a sync point to the
journal after it has been frozen (or possibly in the process). Since
freezing sync's the journal, there is no need to write a sync point so
we simply want to return.
The second involves jfs_write_inode() being called on a deleted inode.
It calls jfs_flush_journal which is held up by the jfs_commit thread
doing the final iput on the same deleted inode, which itself is
waiting for the I_SYNC flag to be cleared. jfs_write_inode need not
do anything when i_nlink is zero, which is the easy fix.
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
After we moved inode_sync_wait() from end_writeback() it doesn't make sense
to call the function end_writeback() anymore. Rename it to clear_inode()
which well says what the function really does - set I_CLEAR flag.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Simple filesystems always pass inode->i_sb_bdev as the block device
argument, and never need a end_io handler. Let's simply things for
them and for my grepping activity by dropping these arguments. The
only thing not falling into that scheme is ext4, which passes and
end_io handler without needing special flags (yet), but given how
messy the direct I/O code there is use of __blockdev_direct_IO
in one instead of two out of three cases isn't going to make a large
difference anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
anything else, so that the filesystem can track internally if it
needs to push out a transaction for fdatasync or not.
This is just the prototype change with no user for it yet. I plan
to push large XFS changes for the next merge window, and getting
this trivial infrastructure in this window would help a lot to avoid
tree interdependencies.
Also remove incorrect comments that ->dirty_inode can't block. That
has been changed a long time ago, and many implementations rely on it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Move the call to vmtruncate to get rid of accessive blocks to the only
remaining caller and rename the non-truncating version to nobh_write_begin.
Get rid of the superflous file argument to it while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move the call to vmtruncate to get rid of accessive blocks to the callers
in prepearation of the new truncate calling sequence. This was only done
for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
its _newtrunc variant while at it as just opencoding the two additional
paramters is shorted than the name suffix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
generic setattr not longer responsible for quota transfer.
use jfs_setattr for all jfs's inodes.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
quota: stop using QUOTA_OK / NO_QUOTA
dquot: cleanup dquot initialize routine
dquot: move dquot initialization responsibility into the filesystem
dquot: cleanup dquot drop routine
dquot: move dquot drop responsibility into the filesystem
dquot: cleanup dquot transfer routine
dquot: move dquot transfer responsibility into the filesystem
dquot: cleanup inode allocation / freeing routines
dquot: cleanup space allocation / freeing routines
ext3: add writepage sanity checks
ext3: Truncate allocated blocks if direct IO write fails to update i_size
quota: Properly invalidate caches even for filesystems with blocksize < pagesize
quota: generalize quota transfer interface
quota: sb_quota state flags cleanup
jbd: Delay discarding buffers in journal_unmap_buffer
ext3: quota_write cross block boundary behaviour
quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
quota: split out compat_sys_quotactl support from quota.c
quota: split out netlink notification support from quota.c
quota: remove invalid optimization from quota_sync_all
...
Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c
This gives the filesystem more information about the writeback that
is happening. Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Get rid of the initialize dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.
Rename the now static low-level dquot_initialize helper to __dquot_initialize
and vfs_dq_init to dquot_initialize to have a consistent namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently various places in the VFS call vfs_dq_init directly. This means
we tie the quota code into the VFS. Get rid of that and make the
filesystem responsible for the initialization. For most metadata operations
this is a straight forward move into the methods, but for truncate and
open it's a bit more complicated.
For truncate we currently only call vfs_dq_init for the sys_truncate case
because open already takes care of it for ftruncate and open(O_TRUNC) - the
new code causes an additional vfs_dq_init for those which is harmless.
For open the initialization is moved from do_filp_open into the open method,
which means it happens slightly earlier now, and only for regular files.
The latter is fine because we don't need to initialize it for operations
on special files, and we already do it as part of the namespace operations
for directories.
Add a dquot_file_open helper that filesystems that support generic quotas
can use to fill in ->open.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Get rid of the drop dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.
Rename the now static low-level dquot_drop helper to __dquot_drop
and vfs_dq_drop to dquot_drop to have a consistent namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Get rid of the alloc_inode and free_inode dquot operations - they are
always called from the filesystem and if a filesystem really needs
their own (which none currently does) it can just call into it's
own routine directly.
Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always
call the lowlevel dquot_alloc_inode / dqout_free_inode routines
directly, which now lose the number argument which is always 1.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Use lowercase names of quota functions instead of old uppercase ones.
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
This is an alternate fix for a bug reported and fixed by Duane Griffin.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Reported-by: Duane Griffin <duaneg@dghda.com>
Stop the JFS filesystem from using iget() and read_inode(). Replace
jfs_read_inode() with jfs_iget(), and call that instead of iget(). jfs_iget()
then uses iget_locked() directly and returns a proper error code instead of an
inode in the event of an error.
jfs_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement nobh in new aops. This is a bit tricky. FWIW, nobh_truncate is
now implemented in a way that does not create blocks in sparse regions,
which is a silly thing for it to have been doing (isn't it?)
ext2 survives fsx and fsstress. jfs is converted as well... ext3
should be easy to do (but not done yet).
[akpm@linux-foundation.org: coding-style fixes]
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This mirrors Jan Kara's patches for ext3. This patch makes sure that
changes made to inode->i_flags are reflected on disk for jfs. It also
moves a call of jfs_set_inode_flags() to be more consistent with where
jfs_get_inode_flags() is called.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Removed trailing spaces & tabs, and spaces preceding tabs.
Also a couple very minor comment cleanups.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
(cherry picked from f74156539964d7b3d5164fdf8848e6a682f75b97 commit)
Conversion of booleans to: generic-boolean.patch (2006-08-23)
Signed-off-by: Richard Knutsson <ricknu-0@student.ltu.se>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
jfs_quota_read/write are very near duplicates of ext2_quota_read/write.
Cleaned up jfs_get_block as long as I had to change it to be non-static.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that get_block() can handle mapping multiple disk blocks, no need to have
->get_blocks(). This patch removes fs specific ->get_blocks() added for DIO
and makes it users use get_block() instead.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch changes mpage_readpages() and get_block() to get the disk mapping
information for multiple blocks at the same time.
b_size represents the amount of disk mapping that needs to mapped. On the
successful get_block() b_size indicates the amount of disk mapping thats
actually mapped. Only the filesystems who care to use this information and
provide multiple disk blocks at a time can choose to do so.
No changes are needed for the filesystems who wants to ignore this.
[akpm@osdl.org: cleanups]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ext2 inode attributes with relevance for jfs:
'a' EXT2_APPEND_FL -> append only
'i' EXT2_IMMUTABLE_FL -> immutable file
's' EXT2_SECRM_FL -> zero file
'u' EXT2_UNRM_FL -> allow for unrm
'A' EXT2_NOATIME_FL -> no access time
'D' EXT2_DIRSYNC_FL -> dirsync
'S' EXT2_SYNC_FL -> sync
overview of jfs flags (partially for OS/2)
value (OS/2) Linux ext2 attrs
------------------------------------------------
0x00010000 IFJOURNAL -
0x00020000 ISPARSE used
0x00040000 INLINEEA used
0x00080000 - - JFS_NOATIME_FL
0x00100000 - - JFS_DIRSYNC_FL
0x00200000 - - JFS_SYNC_FL
0x00400000 - - JFS_SECRM_FL
0x00800000 ISWAPFILE - JFS_UNRM_FL
0x01000000 - - JFS_APPEND_FL
0x02000000 IREADONLY - JFS_IMMUTABLE_FL
0x04000000 IHIDDEN - -
0x08000000 ISYSTEM - -
0x10000000 - -
0x20000000 IDIRECTORY used
0x40000000 IARCHIVE -
0x80000000 INEWNAME -
the implementation is straight forward, except
for the fact that the attributes have to be mapped
to match with the ext2 ones to avoid a separate
tool for manipulating them (this could be avoided
when using a separate flag field in the on-disk
representation, but the overhead is minimal)
a special jfs_ioctl is added to allow for the new
JFS_IOC_GETFLAGS and JFS_IOC_SETFLAGS calls.
a helper function jfs_set_inode_flags() to transfer
the flags from the on-disk version to the inode
minor changes to allow flag inheritance on inode
creation, as well as a cleanup of the on-disk
flags (including the new ones)
beforementioned helper to map between ext2 and jfs
versions of the new flags ...
the JFS_SECRM_FL and JFS_UNRM_FL are not done yet
and I'm not 100% sure they are worth the effort,
the rest seems to work out of the box ...
Signed-off-by: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
the conversion was generated via scripts, and the result was validated
automatically via a script as well.
build and boot tested.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
The fix in inode.c is a real bug. It could result in undeleted, yet
unconnected files on big-endian hardware.
The others are trivial.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Clash due to new delete_inode behavior (the filesystem now needs to do
the truncate_inode_pages() call itself).
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update the file systems in fs/ implementing a delete_inode() callback to
call truncate_inode_pages(). One implementation note: In developing this
patch I put the calls to truncate_inode_pages() at the very top of those
filesystems delete_inode() callbacks in order to retain the previous
behavior. I'm guessing that some of those could probably be optimized.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some error paths may iput an invalid inode with i_nlink=0. jfs should
not try to actually delete such an inode.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Modify xtSearch so that it returns the next allocated block when the
requested block is unmapped. This can be used to make sure we don't
create a new extent that overlaps the next one.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>