Fix compiler warning
fs/udf/balloc.c: In function 'udf_bitmap_new_block':
fs/udf/balloc.c:273: warning: passing argument 1 of '_find_next_bit_le' from incompatible pointer type
fs/udf/balloc.c:285: warning: passing argument 1 of '_find_next_bit_le' from incompatible pointer type
fs/udf/balloc.c:311: warning: passing argument 1 of '_find_next_bit_le' from incompatible pointer type
fs/udf/balloc.c:325: warning: passing argument 1 of '_find_next_bit_le' from incompatible pointer type
The main fix is to add a cast in ext2_find_next_bit().
As all other usage locations of udf_find_next_one_bit()
directly use bh->b_data (which is a char *), the useless
(char *) cast in line 311 can be removed, too.
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: George G. Davis <gdavis@mvista.com>
Signed-off-by: Jan Kara <jack@suse.cz>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
UDF: Close small mem leak in udf_find_entry()
udf: Fix directory corruption after extent merging
udf: Protect udf_file_aio_write from possible races
udf: Remove unnecessary bkl usages
udf: Use of s_alloc_mutex to serialize udf_relocate_blocks() execution
udf: Replace bkl with the UDF_I(inode)->i_data_sem for protect udf_inode_info struct
udf: Remove BKL from free space counting functions
udf: Call udf_add_free_space() for more blocks at once in udf_free_blocks()
udf: Remove BKL from udf_put_super() and udf_remount_fs()
udf: Protect default inode credentials by rwlock
udf: Protect all modifications of LVID with s_alloc_mutex
udf: Move handling of uniqueID into a helper function and protect it by a s_alloc_mutex
udf: Remove BKL from udf_update_inode
udf: Convert UDF_SB(sb)->s_flags to use bitops
fs/udf: Add printf format/argument verification
fs/udf: Use vzalloc
(Evil merge: this also removes the BKL dependency from the Kconfig file)
RCU free the struct inode. This will allow:
- Subsequent store-free path walking patch. The inode must be consulted for
permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
page lock to follow page->mapping.
The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.
In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.
The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Hi,
There's a small memory leak in fs/udf/namei.c::udf_find_entry().
We dynamically allocate memory for 'fname' with kmalloc() and in most
situations we free it before we leave the function, but there is one
situation where we do not (but should). This patch closes the leak by
jumping to the 'out_ok' label which does the correct cleanup rather than
doing half the cleanup and returning directly.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jan Kara <jack@suse.cz>
If udf_bread() called from udf_add_entry() managed to merge created extent to
an already existing one (or if previous extents could be merged), the code
truncating the last extent to proper size would just overwrite the freshly
allocated extent with an extent that used to be in that place. This obviously
results in a directory corruption. Fix the problem by properly reloading the
last extent.
Signed-off-by: Jan Kara <jack@suse.cz>
Code doing conversion from INICB file to a normal file in udf_file_aio_write()
is not protected by any lock from other code modifying the inode. Use
i_alloc_sem for that.
Reported-by: Alessio Igor Bogani <abogani@texware.it>
Signed-off-by: Jan Kara <jack@suse.cz>
The udf_readdir(), udf_lookup(), udf_create(), udf_mknod(), udf_mkdir(),
udf_rmdir(), udf_link(), udf_get_parent() and udf_unlink() seems already
adequately protected by i_mutex held by VFS invoking calls. The udf_rename()
instead should be already protected by lock_rename again by VFS. The
udf_ioctl(), udf_fill_super() and udf_evict_inode() don't requires any further
protection.
This work was supported by a hardware donation from the CE Linux Forum.
Signed-off-by: Alessio Igor Bogani <abogani@texware.it>
Signed-off-by: Jan Kara <jack@suse.cz>
This work was supported by a hardware donation from the CE Linux Forum.
Signed-off-by: Alessio Igor Bogani <abogani@texware.it>
Signed-off-by: Jan Kara <jack@suse.cz>
Replace bkl with the UDF_I(inode)->i_data_sem rw semaphore in
udf_release_file(), udf_symlink(), udf_symlink_filler(), udf_get_block(),
udf_block_map(), and udf_setattr(). The rule now is that any operation
on regular file's or symlink's extents (or generally allocation information
including goal block) needs to hold i_data_sem.
This work was supported by a hardware donation from the CE Linux Forum.
Signed-off-by: Alessio Igor Bogani <abogani@texware.it>
Signed-off-by: Jan Kara <jack@suse.cz>
udf_count_free_bitmap() does not need BKL because bitmaps are in a fixed
place on disk and so we can count set bits without serialization.
udf_count_free_table() is now protected by s_alloc_mutex instead of BKL
to get a consistent view of free space extents.
Signed-off-by: Jan Kara <jack@suse.cz>
There's no need to call udf_add_free_space() for one block at a time. It saves
us noticeable amount of work and yields different result from the original
code only if the filesystem is corrupted and bitmap bit is already cleared.
In such case counter of free blocks is probably wrong anyways so the change
does not matter.
Signed-off-by: Jan Kara <jack@suse.cz>
udf_put_super() does not need BKL because the filesystem is shut down so
there's nothing to race with. The credential changes in udf_remount_fs()
and LVID changes are now protected by dedicated locks so we can remove BKL
from this function as well.
Signed-off-by: Jan Kara <jack@suse.cz>
Superblock carries credentials (uid, gid, etc.) which are used as default
values in __udf_read_inode() when media does not provide these. These
credentials can change during remount so we protect them by a rwlock so that
each inode gets a consistent set of credentials.
Signed-off-by: Jan Kara <jack@suse.cz>
udf_open_lvid() and udf_close_lvid() were modifying LVID without
s_alloc_mutex. Since they can be called from remount, the modification
could race with other filesystem modifications of LVID so protect them
by s_alloc_mutex just to be sure.
Signed-off-by: Jan Kara <jack@suse.cz>
uniqueID handling has been duplicated in three places. Move it into a common
helper. Since we modify an LVID buffer with uniqueID update, we take
sbi->s_alloc_mutex to protect agaist other modifications of the structure.
Signed-off-by: Jan Kara <jack@suse.cz>
udf_update_inode() does not need BKL since on-disk inode modifications are
protected by the buffer lock and reading of values of in-memory inode is
safe without any lock. In some cases we can write inconsistent inode state
to disk but in that case inode will be marked dirty and overwritten later.
Also make unnecessarily global udf_sync_inode() static.
Signed-off-by: Jan Kara <jack@suse.cz>
Add __attribute__((format... to udf_warning.
All arguments matched formats, no other changes necessary.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jan Kara <jack@suse.cz>
* 'vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: (30 commits)
BKL: remove BKL from freevxfs
BKL: remove BKL from qnx4
autofs4: Only declare function when CONFIG_COMPAT is defined
autofs: Only declare function when CONFIG_COMPAT is defined
ncpfs: Lock socket in ncpfs while setting its callbacks
fs/locks.c: prepare for BKL removal
BKL: Remove BKL from ncpfs
BKL: Remove BKL from OCFS2
BKL: Remove BKL from squashfs
BKL: Remove BKL from jffs2
BKL: Remove BKL from ecryptfs
BKL: Remove BKL from afs
BKL: Remove BKL from USB gadgetfs
BKL: Remove BKL from autofs4
BKL: Remove BKL from isofs
BKL: Remove BKL from fat
BKL: Remove BKL from ext2 filesystem
BKL: Remove BKL from do_new_mount()
BKL: Remove BKL from cgroup
BKL: Remove BKL from NTFS
...
With all the patches we have queued in the BKL removal tree, only a
few dozen modules are left that actually rely on the BKL, and even
there are lots of low-hanging fruit. We need to decide what to do
about them, this patch illustrates one of the options:
Every user of the BKL is marked as 'depends on BKL' in Kconfig,
and the CONFIG_BKL becomes a user-visible option. If it gets
disabled, no BKL using module can be built any more and the BKL
code itself is compiled out.
The one exception is file locking, which is practically always
enabled and does a 'select BKL' instead. This effectively forces
CONFIG_BKL to be enabled until we have solved the fs/lockd
mess and can apply the patch that removes the BKL from fs/locks.c.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This patch is a preparation necessary to remove the BKL from do_new_mount().
It explicitly adds calls to lock_kernel()/unlock_kernel() around
get_sb/fill_super operations for filesystems that still uses the BKL.
I've read through all the code formerly covered by the BKL inside
do_kern_mount() and have satisfied myself that it doesn't need the BKL
any more.
do_kern_mount() is already called without the BKL when mounting the rootfs
and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
afs_mntpt_follow_link(). Both later functions are actually the filesystems
follow_link inode operation. vfs_kern_mount() is calling the specified
get_sb function and lets the filesystem do its job by calling the given
fill_super function.
Therefore I think it is safe to push down the BKL from the VFS to the
low-level filesystems get_sb/fill_super operation.
[arnd: do not add the BKL to those file systems that already
don't use it elsewhere]
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Christoph Hellwig <hch@infradead.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
no need for list_for_each_entry_safe()/resetting with superblock list
Fix sget() race with failing mount
vfs: don't hold s_umount over close_bdev_exclusive() call
sysv: do not mark superblock dirty on remount
sysv: do not mark superblock dirty on mount
btrfs: remove junk sb_dirt change
BFS: clean up the superblock usage
AFFS: wait for sb synchronization when needed
AFFS: clean up dirty flag usage
cifs: truncate fallout
mbcache: fix shrinker function return value
mbcache: Remove unused features
add f_flags to struct statfs(64)
pass a struct path to vfs_statfs
update VFS documentation for method changes.
All filesystems that need invalidate_inode_buffers() are doing that explicitly
convert remaining ->clear_inode() to ->evict_inode()
Make ->drop_inode() just return whether inode needs to be dropped
fs/inode.c:clear_inode() is gone
fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
...
Fix up trivial conflicts in fs/nilfs2/super.c
Replace inode_setattr with opencoded variants of it in all callers. This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.
In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:
spufs: explicitly checks for ATTR_SIZE earlier
btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above
In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
For the new truncate sequence every filesystem that wants to truncate on-disk
state needs a seattr method. Convert the remaining filesystems that implement
the truncate inode operation to have its own setattr method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move the call to vmtruncate to get rid of accessive blocks to the callers
in preparation of the new truncate sequence and rename the non-truncating
version to block_write_begin.
While we're at it also remove several unused arguments to block_write_begin.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This fixes this warning when building the kernel:
CC fs/udf/super.o
fs/udf/super.c: In function 'udf_load_sequence':
fs/udf/super.c:1582:22: warning: variable 'sbi' set but not used
Please have a look, when you have time and let me know.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
quota: Convert quota statistics to generic percpu_counter
ext3 uses rb_node = NULL; to zero rb_root.
quota: Fixup dquot_transfer
reiserfs: Fix resuming of quotas on remount read-write
pohmelfs: Remove dead quota code
ufs: Remove dead quota code
udf: Remove dead quota code
quota: rename default quotactl methods to dquot_
quota: explicitly set ->dq_op and ->s_qcop
quota: drop remount argument to ->quota_on and ->quota_off
quota: move unmount handling into the filesystem
quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappers
quota: move remount handling into the filesystem
ocfs2: Fix use after free on remount read-only
Fix up conflicts in fs/ext4/super.c and fs/ufs/file.c
We don't name our generic fsync implementations very well currently.
The no-op implementation for in-memory filesystems currently is called
simple_sync_file which doesn't make too much sense to start with,
the the generic one for simple filesystems is called simple_fsync
which can lead to some confusion.
This patch renames the generic file fsync method to generic_file_fsync
to match the other generic_file_* routines it is supposed to be used
with, and the no-op implementation to noop_fsync to make it obvious
what to expect. In addition add some documentation for both methods.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Do not use the fallback default_llseek() if the readdir operation of the
filesystem still uses the big kernel lock.
Since llseek() modifies
file->f_pos of the directory directly it may need locking to not confuse
readdir which usually uses file->f_pos directly as well
Since the special characteristics of the BKL (unlocked on schedule) are
not necessary in this case, the inode mutex can be used for locking as
provided by generic_file_llseek(). This is only possible since all
filesystems, except reiserfs, either use a directory as a flat file or
with disk address offsets. Reiserfs on the other hand uses a 32bit hash
off the filename as the offset so generic_file_llseek() can get used as
well since the hash is always smaller than sb->s_maxbytes (= (512 << 32) -
blocksize).
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Anders Larsen <al@alarsen.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Quota on UDF is non-functional at least since 2.6.16 (I'm too lazy to
do more archeology) because it does not provide .quota_write and .quota_read
functions and thus quotaon(8) just returns EINVAL. Since nobody complained
for all those years and quota support is not even in UDF standard just nuke
it.
Signed-off-by: Jan Kara <jack@suse.cz>
Follow the dquot_* style used elsewhere in dquot.c.
[Jan Kara: Fixed up missing conversion of ext2]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Only set the quota operation vectors if the filesystem actually supports
quota instead of doing it for all filesystems in alloc_super().
[Jan Kara: Export dquot_operations and vfs_quotactl_ops]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently the VFS calls into the quotactl interface for unmounting
filesystems. This means filesystems with their own quota handling
can't easily distinguish between user-space originating quotaoff
and an unount. Instead move the responsibily of the unmount handling
into the filesystem to be consistent with all other dquot handling.
Note that we do call dquot_disable a lot later now, e.g. after
a sync_filesystem. But this is fine as the quota code does all its
writes via blockdev's mapping and that is synced even later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Instead of having wrappers in the VFS namespace export the dquot_suspend
and dquot_resume helpers directly. Also rename vfs_quota_disable to
dquot_disable while we're at it.
[Jan Kara: Moved dquot_suspend to quotaops.h and made it inline]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently do_remount_sb calls into the dquot code to tell it about going
from rw to ro and ro to rw. Move this code into the filesystem to
not depend on the dquot code in the VFS - note ocfs2 already ignores
these calls and handles remount by itself. This gets rid of overloading
the quotactl calls and allows to unify the VFS and XFS codepaths in
that area later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Quota must being initialized if size or uid/git changes requested.
But initialization performed in two different places:
in case of i_size file system is responsible for dquot init
, but in case of uid/gid init will be called internally in
dquot_transfer().
This ambiguity makes code harder to understand.
Let's move this logic to one common helper function.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Convert udf_ioctl to an unlocked_ioctl and push the BKL down into it.
Signed-off-by: John Kacur <jkacur@redhat.com
Signed-off-by: Jan Kara <jack@suse.cz>
generic setattr not longer responsible for quota transfer.
use udf_setattr for all udf's inodes.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
bloc->logicalBlockNum is unsigned so it's never less than zero.
When I saw that, it made me worry that "bloc->logicalBlockNum + count"
could overflow. That's why I changed the check for less than zero
to an overflow check. (The test works because "count" is also
unsigned.)
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
udf: use ext2_find_next_bit
udf: Do not read inode before writing it
udf: Fix unalloc space handling in udf_update_inode
Use ext2_find_next_bit (generic_find_next_le_bit) to find the set bit
in little endian bitmap region.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Writing of inode holding unallocated space info was broken because we first
cleared the buffer and after that checked whether it contains a tag meaning the
block holds unallocated space information. Fix the problem by checking
appropriate in memory flag instead.
Also cleanup the function a bit along the way - most importantly lock buffer
when modifying its contents, check for buffer_write_io_error instead of
!buffer_uptodate, etc..
Signed-off-by: Jan Kara <jack@suse.cz>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
quota: stop using QUOTA_OK / NO_QUOTA
dquot: cleanup dquot initialize routine
dquot: move dquot initialization responsibility into the filesystem
dquot: cleanup dquot drop routine
dquot: move dquot drop responsibility into the filesystem
dquot: cleanup dquot transfer routine
dquot: move dquot transfer responsibility into the filesystem
dquot: cleanup inode allocation / freeing routines
dquot: cleanup space allocation / freeing routines
ext3: add writepage sanity checks
ext3: Truncate allocated blocks if direct IO write fails to update i_size
quota: Properly invalidate caches even for filesystems with blocksize < pagesize
quota: generalize quota transfer interface
quota: sb_quota state flags cleanup
jbd: Delay discarding buffers in journal_unmap_buffer
ext3: quota_write cross block boundary behaviour
quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
quota: split out compat_sys_quotactl support from quota.c
quota: split out netlink notification support from quota.c
quota: remove invalid optimization from quota_sync_all
...
Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c
This gives the filesystem more information about the writeback that
is happening. Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Get rid of the initialize dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.
Rename the now static low-level dquot_initialize helper to __dquot_initialize
and vfs_dq_init to dquot_initialize to have a consistent namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently various places in the VFS call vfs_dq_init directly. This means
we tie the quota code into the VFS. Get rid of that and make the
filesystem responsible for the initialization. For most metadata operations
this is a straight forward move into the methods, but for truncate and
open it's a bit more complicated.
For truncate we currently only call vfs_dq_init for the sys_truncate case
because open already takes care of it for ftruncate and open(O_TRUNC) - the
new code causes an additional vfs_dq_init for those which is harmless.
For open the initialization is moved from do_filp_open into the open method,
which means it happens slightly earlier now, and only for regular files.
The latter is fine because we don't need to initialize it for operations
on special files, and we already do it as part of the namespace operations
for directories.
Add a dquot_file_open helper that filesystems that support generic quotas
can use to fill in ->open.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Get rid of the drop dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.
Rename the now static low-level dquot_drop helper to __dquot_drop
and vfs_dq_drop to dquot_drop to have a consistent namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently clear_inode calls vfs_dq_drop directly. This means
we tie the quota code into the VFS. Get rid of that and make the
filesystem responsible for the drop inside the ->clear_inode
superblock operation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Get rid of the transfer dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.
Rename the now static low-level dquot_transfer helper to __dquot_transfer
and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
and make the new dquot_transfer return a normal negative errno value
which all callers expect.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently notify_change calls vfs_dq_transfer directly. This means
we tie the quota code into the VFS. Get rid of that and make the
filesystem responsible for the transfer. Most filesystems already
do this, only ufs and udf need the code added, and for jfs it needs to
be enabled unconditionally instead of only when ACLs are enabled.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Get rid of the alloc_inode and free_inode dquot operations - they are
always called from the filesystem and if a filesystem really needs
their own (which none currently does) it can just call into it's
own routine directly.
Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always
call the lowlevel dquot_alloc_inode / dqout_free_inode routines
directly, which now lose the number argument which is always 1.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Get rid of the alloc_space, free_space, reserve_space, claim_space and
release_rsv dquot operations - they are always called from the filesystem
and if a filesystem really needs their own (which none currently does)
it can just call into it's own routine directly.
Move shared logic into the common __dquot_alloc_space,
dquot_claim_space_nodirty and __dquot_free_space low-level methods,
and rationalize the wrappers around it to move as much as possible
code into the common block for CONFIG_QUOTA vs not. Also rename
all these helpers to be named dquot_* instead of vfs_dq_*.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Some misspelled occurences of 'octet' and some comments were also fixed
as I was on it.
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Jiri Kosina <trivial@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
It is not very good to do IO in udf_clear_inode. First, VFS does not really
expect inode to become dirty there and thus we have to write it ourselves,
second, memory reclaim gets blocked waiting for IO when it does not really
expect it, third, the IO pattern (e.g. on umount) resulting from writes in
udf_clear_inode is bad and it slows down writing a lot.
The reason why UDF needed to do IO in udf_clear_inode is that UDF standard
mandates extent length to exactly match inode size. But when we allocate
extents to a file or directory, we don't really know what exactly the final
file size will be and thus temporarily set it to block boundary and later
truncate it to exact length in udf_clear_inode. Now, this is changed to
truncate to final file size in udf_release_file for regular files. For
directories and symlinks, we do the truncation at the moment when learn
what the final file size will be.
Signed-off-by: Jan Kara <jack@suse.cz>
Some disks do not contain VAT inode in the last recorded block as required
by the standard but a few blocks earlier (or the number of recorded blocks
is wrong). So look for the VAT inode a bit before the end of the media.
Signed-off-by: Jan Kara <jack@suse.cz>
When we close a file, we remove preallocated blocks from it. But this
truncation was not protected by i_mutex and thus it could have raced with a
write through a different fd and cause crashes or even filesystem corruption.
Signed-off-by: Jan Kara <jack@suse.cz>
So far we preallocated blocks also for directories but that brings a
problem, when to get rid of preallocated blocks we don't need. So far
we removed them in udf_clear_inode() which has a disadvantage that
1) blocks are unavailable long after writing to a directory finished
and thus one can get out of space unnecessarily early
2) releasing blocks from udf_clear_inode is problematic because VFS
does not expect us to redirty inode there and it also slows down
memory reclaim.
So preallocate blocks only for regular files where we can drop preallocation
in udf_release_file.
Signed-off-by: Jan Kara <jack@suse.cz>
Recomputation of the pointer was wrong (it should have been just increment).
Luckily, we never use the computed value. Remove it.
Signed-off-by: Jan Kara <jack@suse.cz>
VAT inode is located in the last block recorded block of the medium. When the
drive errorneously reports number of recorded blocks, we failed to load the VAT
inode and thus mount the medium. This patch makes kernel try to read VAT inode
from the last block of the device if it is different from the last recorded
block.
Signed-off-by: Jan Kara <jack@suse.cz>
first_block and goal are unsigned. When negative they are wrapped and caught by
the other test.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Some drives report 0 as the number of written blocks when there are some blocks
recorded. Use device size in such case so that we can automagically mount such
media.
Signed-off-by: Jan Kara <jack@suse.cz>
Move BKL into ->put_super from the only caller. A couple of
filesystems had trivial enough ->put_super (only kfree and NULLing of
s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most
of them probably don't need it, but I'd rather sort that out individually.
Preferably after all the other BKL pushdowns in that area.
[AV: original used to move lock_super() down as well; these changes are
removed since we don't do lock_super() at all in generic_shutdown_super()
now]
[AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Until now we have had a 1:1 mapping between storage device physical
block size and the logical block sized used when addressing the device.
With SATA 4KB drives coming out that will no longer be the case. The
sector size will be 4KB but the logical block size will remain
512-bytes. Hence we need to distinguish between the physical block size
and the logical ditto.
This patch renames hardsect_size to logical_block_size.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
We update information in logical volume integrity descriptor after each
allocation (as LVID contains free space, number of directories and files on
disk etc.). If the filesystem is on some phase change media, this leads to its
quick degradation as such media is able to handle only 10000 overwrites or so.
We solve the problem by writing new information into LVID only on umount,
remount-ro and sync. This solves the problem at the price of longer media
inconsistency (previously media became consistent after pdflush flushed dirty
LVID buffer) but that should be acceptable.
Report by and patch written in cooperation with
Rich Coe <Richard.Coe@med.ge.com>.
Signed-off-by: Jan Kara <jack@suse.cz>
Anchor block can be located at several places on the medium. Two of the
locations are relative to media end which is problematic to detect. Also
some drives report some block as last but are not able to read it or any
block nearby before it. So let's first try block 256 and if it is all fine,
don't look at other possible locations of anchor blocks to avoid IO errors.
This change required a larger reorganization of code but the new code is
hopefully more readable and definitely shorter.
Signed-off-by: Jan Kara <jack@suse.cz>
Make udf_check_valid() return 1 if the validity check passed and 0 otherwise.
So far it was the other way around which was a bit confusing. Also make
udf_vrs() return loff_t which is really the type it should return (not int).
Signed-off-by: Jan Kara <jack@suse.cz>
This patch makes the UDF FS driver use the hardware sector size as the
default logical block size, which is required by the UDF specifications.
While the previous default of 2048 bytes was correct for optical disks,
it was not for hard disks or USB storage devices, and made it impossible
to use such a device with the default mount options. (The Linux mkudffs
tool uses a default block size of 2048 bytes even on devices with
smaller hardware sectors, so this bug is unlikely to be noticed unless
UDF-formatted USB storage devices are exchanged with other OSs.)
To avoid regressions for people who use loopback optical disk images or
who used the (sometimes wrong) defaults of mkudffs, we also try with
a block size of 2048 bytes if no anchor was found with the hardware
sector size.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Functions udf_CS0toNLS() and udf_NLStoCS0() didn't count with the fact that
NLS can return negative length when invalid character is given to it for
conversion. Thus interesting things could happen (such as overwriting random
memory with the rest of filename). Add appropriate checks.
Signed-off-by: Jan Kara <jack@suse.cz>
This patch makes udf return f_fsid info for statfs(2).
Signed-off-by: Coly Li <coly.li@suse.de>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
On x86 (and several other archs) mode_t is defined as "unsigned short"
and comparing unsigned shorts to negative ints is broken (because short
is promoted to int and then compared). Fix it.
Reported-and-tested-by: Laurent Riffard <laurent.riffard@free.fr>
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
"dmode" allows overriding permissions of directories and
"mode" allows overriding permissions of files.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Allocate strings with kmalloc.
Checkstack output:
Before: udf_get_filename: 600
After: udf_get_filename: 136
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Allocate strings with kmalloc.
Checkstack output:
Before: udf_process_sequence: 712
After: udf_process_sequence: 200
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Conflicts:
fs/nfsd/nfs4recover.c
Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().
Signed-off-by: James Morris <jmorris@namei.org>
udf_clear_inode() can leave behind buffers on mapping's i_private list (when
we truncated preallocation). Call invalidate_inode_buffers() so that the list
is properly cleaned-up before we return from udf_clear_inode(). This is ugly
and suggest that we should cleanup preallocation earlier than in clear_inode()
but currently there's no such call available since drop_inode() is called under
inode lock and thus is unusable for disk operations.
Signed-off-by: Jan Kara <jack@suse.cz>
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: James Morris <jmorris@namei.org>