linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Al Viro	326be7b484	Allow passing O_PATH descriptors via SCM_RIGHTS datagrams Just need to make sure that AF_UNIX garbage collector won't confuse O_PATHed socket on filesystem for real AF_UNIX opened socket. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Al Viro	65cfc67223	readlinkat(), fchownat() and fstatat() with empty relative pathnames For readlinkat() we simply allow empty pathname; it will fail unless we have dfd equal to O_PATH-opened symlink, so we are outside of POSIX scope here. For fchownat() and fstatat() we allow AT_EMPTY_PATH; let the caller explicitly ask for such behaviour. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Al Viro	bcda76524c	Allow O_PATH for symlinks At that point we can't do almost nothing with them. They can be opened with O_PATH, we can manipulate such descriptors with dup(), etc. and we can see them in /proc//{fd,fdinfo}/. We can't (and won't be able to) follow /proc//fd/ symlinks for those; there's simply not enough information for pathname resolution to go on from such point - to resolve a symlink we need to know which directory does it live in. We will be able to do useful things with them after the next commit, though - readlinkat() and fchownat() will be possible to use with dfd being an O_PATH-opened symlink and empty relative pathname. Combined with open_by_handle() it'll give us a way to do realink-by-handle and lchown-by-handle without messing with more redundant syscalls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Al Viro	1abf0c718f	New kind of open files - "location only". New flag for open(2) - O_PATH. Semantics: * pathname is resolved, but the file itself is _NOT_ opened as far as filesystem is concerned. * almost all operations on the resulting descriptors shall fail with -EBADF. Exceptions are: 1) operations on descriptors themselves (i.e. close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD), fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD), fcntl(fd, F_SETFD, ...)) 2) fcntl(fd, F_GETFL), for a common non-destructive way to check if descriptor is open 3) "dfd" arguments of ...at(2) syscalls, i.e. the starting points of pathname resolution * closing such descriptor does NOT affect dnotify or posix locks. * permissions are checked as usual along the way to file; no permission checks are applied to the file itself. Of course, giving such thing to syscall will result in permission checks (at the moment it means checking that starting point of ....at() is a directory and caller has exec permissions on it). fget() and fget_light() return NULL on such descriptors; use of fget_raw() and fget_raw_light() is needed to get them. That protects existing code from dealing with those things. There are two things still missing (they come in the next commits): one is handling of symlinks (right now we refuse to open them that way; see the next commit for semantics related to those) and another is descriptor passing via SCM_RIGHTS datagrams. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V	f2fa2ffc20	ext4: Copy fs UUID to superblock File system UUID is made available to application via /proc/<pid>/mountinfo Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V	03cb5f03dc	ext3: Copy fs UUID to superblock. File system UUID is made available to application via /proc/<pid>/mountinfo Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V	93f1c20bc8	vfs: Export file system uuid via /proc/<pid>/mountinfo We add a per superblock uuid field. File systems should update the uuid in the fill_super callback Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V	f17b604207	fs: Remove i_nlink check from file system link callback Now that VFS check for inode->i_nlink == 0 and returns proper error, remove similar check from file system Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V	aae8a97d3e	fs: Don't allow to create hardlink for deleted file Add inode->i_nlink == 0 check in VFS. Some of the file systems do this internally. A followup patch will remove those instance. This is needed to ensure that with link by handle we don't allow to create hardlink of an unlinked file. The check also prevent a race between unlink and link Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V	becfd1f375	vfs: Add open by file handle support [AV: duplicate of open() guts removed; file_open_root() used instead] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V	990d6c2d7a	vfs: Add name to file handle conversion support The syscall also return mount id which can be used to lookup file system specific information such as uuid in /proc/<pid>/mountinfo Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:37 -04:00
Al Viro	f52e0c1130	New AT_... flag: AT_EMPTY_PATH For name_to_handle_at(2) we'll want both ...at()-style syscall that would be usable for non-directory descriptors (with empty relative pathname). Introduce new flag (AT_EMPTY_PATH) to deal with that and corresponding LOOKUP_EMPTY; teach user_path_at() and path_init() to deal with the latter. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 19:12:20 -04:00
Linus Torvalds	5f40d42094	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: NFSROOT should default to "proto=udp" nfs4: remove duplicated #include NFSv4: nfs4_state_mark_reclaim_nograce() should be static NFSv4: Fix the setlk error handler NFSv4.1: Fix the handling of the SEQUENCE status bits NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses NFSv4.1 reclaim complete must wait for completion NFSv4: remove duplicate clientid in struct nfs_client NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY sunrpc: Propagate errors from xs_bind() through xs_create_sock() (try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid nfs: fix compilation warning nfs: add kmalloc return value check in decode_and_add_ds SUNRPC: Remove resource leak in svc_rdma_send_error() nfs: close NFSv4 COMMIT vs. CLOSE race SUNRPC: Close a race in __rpc_wait_for_completion_task()	2011-03-14 11:19:50 -07:00
Timo Warns	1eafbfeb7b	Fix corrupted OSF partition table parsing The kernel automatically evaluates partition tables of storage devices. The code for evaluating OSF partitions contains a bug that leaks data from kernel heap memory to userspace for certain corrupted OSF partitions. In more detail: for (i = 0 ; i < le16_to_cpu(label->d_npartitions); i++, partition++) { iterates from 0 to d_npartitions - 1, where d_npartitions is read from the partition table without validation and partition is a pointer to an array of at most 8 d_partitions. Add the proper and obvious validation. Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: stable@kernel.org [ Changed the patch trivially to not repeat the whole le16_to_cpu() thing, and to use an explicit constant for the magic value '8' ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-14 10:14:28 -07:00
Maxim	6c474f7bc1	GFS2: Adding missing unlock_page() gfs2_write_begin() calls grab_cache_page_write_begin() that returns locked page. Correspondent error-handling path lacks for unlock_page() call: > out: > if (error == 0) > return 0; > > page_cache_release(page); The whole system hangs if gfs2_unstuff_dinode() called from gfs2_write_begin() failed for some reason. Reported-by: Maxim <maxim.patlasov@gmail.com> Signed-off-by: Maxim <maxim.patlasov@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-14 13:19:21 +00:00
Aneesh Kumar K.V	5fe0c23788	exportfs: Return the minimum required handle size The exportfs encode handle function should return the minimum required handle size. This helps user to find out the handle size by passing 0 handle size in the first step and then redoing to the call again with the returned handle size value. Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:28 -04:00
Al Viro	c8b91accfa	clean statfs-like syscalls up New helpers: user_statfs() and fd_statfs(), taking userland pathname and descriptor resp. and filling struct kstatfs. Syscalls of statfs family (native, compat and foreign - osf and hpux on alpha and parisc resp.) switched to those. Removes some boilerplate code, simplifies cleanup on errors... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:28 -04:00
Al Viro	73d049a40f	open-style analog of vfs_path_lookup() new function: file_open_root(dentry, mnt, name, flags) opens the file vfs_path_lookup would arrive to. Note that name can be empty; in that case the usual requirement that dentry should be a directory is lifted. open-coded equivalents switched to it, may_open() got down exactly one caller and became static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:28 -04:00
Al Viro	5b6ca027d8	reduce vfs_path_lookup() to do_path_lookup() New lookup flag: LOOKUP_ROOT. nd->root is set (and held) by caller, path_init() starts walking from that place and all pathname resolution machinery never drops nd->root if that flag is set. That turns vfs_path_lookup() into a special case of do_path_lookup() and gets us down to 3 callers of link_path_walk(), making it finally feasible to rip the handling of trailing symlink out of link_path_walk(). That will not only simply the living hell out of it, but make life much simpler for unionfs merge. Trailing symlink handling will become iterative, which is a good thing for stack footprint in a lot of situations as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	5a18fff209	untangle do_lookup() That thing has devolved into rats nest of gotos; sane use of unlikely() gets rid of that horror and gives much more readable structure: * make a fast attempt to find a dentry; false negatives are OK. In RCU mode if everything went fine, we are done, otherwise just drop out of RCU. If we'd done (RCU) ->d_revalidate() and it had not refused outright (i.e. didn't give us -ECHILD), remember its result. * now we are not in RCU mode and hopefully have a dentry. If we do not, lock parent, do full d_lookup() and if that has not found anything, allocate and call ->lookup(). If we'd done that ->lookup(), remember that dentry is good and we don't need to revalidate it. * now we have a dentry. If it has ->d_revalidate() and we can't skip it, call it. * hopefully dentry is good; if not, either fail (in case of error) or try to invalidate it. If d_invalidate() has succeeded, drop it and retry everything as if original attempt had not found a dentry. * now we can finish it up - deal with mountpoint crossing and automount. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	40b39136f0	path_openat: clean ELOOP handling a bit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	f374ed5fa8	do_last: kill a rudiment of old ->d_revalidate() workaround There used to be time when ->d_revalidate() couldn't return an error. So intents code had lookup_instantiate_filp() stash ERR_PTR(error) in nd->intent.open.filp and had it checked after lookup_hash(), to catch the otherwise silent failures. That had been introduced by commit `4af4c52f34`. These days ->d_revalidate() can and does propagate errors back to callers explicitly, so this check isn't needed anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	6c0d46c493	fold __open_namei_create() and open_will_truncate() into do_last() ... and clean up a bit more Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	ca344a894b	do_last: unify may_open() call and everyting after it We have a bunch of diverging codepaths in do_last(); some of them converge, but the case of having to create a new file duplicates large part of common tail of the rest and exits separately. Massage them so that they could be merged. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	9b44f1b392	move may_open() from __open_name_create() to do_last() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	0f9d1a10c3	expand finish_open() in its only caller Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	5a202bcd75	sanitize pathname component hash calculation Lift it to lookup_one_len() and link_path_walk() resp. into the same place where we calculated default hash function of the same name. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	6a96ba5441	kill __lookup_one_len() only one caller left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	fe2d35ff0d	switch non-create side of open() to use of do_last() Instead of path_lookupat() doing trailing symlink resolution, use the same scheme as on the O_CREAT side. Walk with LOOKUP_PARENT, then (in do_last()) look the final component up, then either open it or return error or, if it's a symlink, give the symlink back to path_openat() to be resolved there. The really messy complication here is RCU. We don't want to drop out of RCU mode before the final lookup, since we don't want to bounce parent directory ->d_count without a good reason. Result is _not_ pretty; later in the series we'll clean it up. For now we are roughly back where we'd been before the revert done by Nick's series - top-level logics of path_openat() is cleaned up, do_last() does actual opening, symlink resolution is done uniformly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	70e9b35711	get rid of nd->file Don't stash the struct file * used as starting point of walk in nameidata; pass file ** to path_init() instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	951361f954	get rid of the last LOOKUP_RCU dependencies in link_path_walk() New helper: terminate_walk(). An error has happened during pathname resolution and we either drop nd->path or terminate RCU, depending the mode we had been in. After that, nd is essentially empty. Switch link_path_walk() to using that for cleanup. Now the top-level logics in link_path_walk() is back to sanity. RCU dependencies are in the lower-level functions. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	a7472baba2	make nameidata_dentry_drop_rcu_maybe() always leave RCU mode Now we have do_follow_link() guaranteed to leave without dangling RCU and the next step will get LOOKUP_RCU logics completely out of link_path_walk(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	ef7562d528	make handle_dots() leave RCU mode on error Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	4455ca6223	clear RCU on all failure exits from link_path_walk() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	9856fa1b28	pull handling of . and .. into inlined helper getting LOOKUP_RCU checks out of link_path_walk()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	7bc055d1d5	kill out_dput: in link_path_walk() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	13aab428a7	separate -ESTALE/-ECHILD retries in do_filp_open() from real work new helper: path_openat(). Does what do_filp_open() does, except that it tries only the walk mode (RCU/normal/force revalidation) it had been told to. Both create and non-create branches are using path_lookupat() now. Fixed the double audit_inode() in non-create branch. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	47c805dc2d	switch do_filp_open() to struct open_flags take calculation of open_flags by open(2) arguments into new helper in fs/open.c, move filp_open() over there, have it and do_sys_open() use that helper, switch exec.c callers of do_filp_open() to explicit (and constant) struct open_flags. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	c3e380b0b3	Collect "operation mode" arguments of do_last() into a structure No point messing with passing shitloads of "operation mode" arguments to do_open() one by one, especially since they are not going to change during do_filp_open(). Collect them into a struct, fill it and pass to do_last() by reference. Make sure that lookup intent flags are correctly set and removed - we want them for do_last(), but they make no sense for __do_follow_link(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	f1afe9efc8	clean up the failure exits after __do_follow_link() in do_filp_open() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	36f3b4f690	pull security_inode_follow_link() into __do_follow_link() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	086e183a64	pull dropping RCU on success of link_path_walk() into path_lookupat() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	16c2cd7179	untangle the "need_reval_dot" mess instead of ad-hackery around need_reval_dot(), do the following: set a flag (LOOKUP_JUMPED) in the beginning of path, on absolute symlink traversal, on ".." and on procfs-style symlinks. Clear on normal components, leave unchanged on ".". Non-nested callers of link_path_walk() call handle_reval_path(), which checks that flag is set and that fs does want the final revalidate thing, then does ->d_revalidate(). In link_path_walk() all the return_reval stuff is gone. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	fe479a580d	merge component type recognition no need to do it in three places... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	e41f7d4ee5	merge path_init and path_init_rcu Actual dependency on whether we want RCU or not is in 3 small areas (as it ought to be) and everything around those is the same in both versions. Since each function has only one caller and those callers are on two sides of if (flags & LOOKUP_RCU), it's easier and cleaner to merge them and pull the checks inside. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	ee0827cd6b	sanitize path_walk() mess New helper: path_lookupat(). Basically, what do_path_lookup() boils to modulo -ECHILD/-ESTALE handler. path_walk* family is gone; vfs_path_lookup() is using link_path_walk() directly, do_path_lookup() and do_filp_open() are using path_lookupat(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	52094c8a06	take RCU-dependent stuff around exec_permission() into a new helper Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:23 -04:00
Al Viro	c9c6cac0c2	kill path_lookup() all remaining callers pass LOOKUP_PARENT to it, so flags argument can die; renamed to kern_path_parent() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:23 -04:00
Steven Whitehouse	c618e87a5f	GFS2: Update to AIL list locking The previous patch missed a couple of places where the AIL list needed locking, so this fixes up those places, plus a comment is corrected too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Chinner <dchinner@redhat.com>	2011-03-14 12:40:29 +00:00
Al Viro	c44ed965be	compat breakage in preadv() and pwritev() Fix for a dumb preadv()/pwritev() compat bug - unlike the native variants, the compat_... ones forget to check FMODE_P{READ,WRITE}, so e.g. on pipe the native preadv() will fail with -ESPIPE and compat one will act as readv() and succeed. Not critical, but it's a clear bug with trivial fix, so IMO it's OK for -final. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-13 16:29:07 -07:00
Al Viro	586ce098a2	compat breakage in preadv() and pwritev() Fix for a dumb preadv()/pwritev() compat bug - unlike the native variants, compat_... ones forget to check FMODE_P{READ,WRITE}, so e.g. on pipe the native preadv() will fail with -ESPIPE and compat one will act as readv() and succeed. Not critical, but it's a clear bug with trivial fix. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-13 19:21:26 -04:00
Linus Torvalds	0e5b88cd99	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: break out of shrink_delalloc earlier btrfs: fix not enough reserved space btrfs: fix dip leak Btrfs: make sure not to return overlapping extents to fiemap Btrfs: deal with short returns from copy_from_user Btrfs: fix regressions in copy_from_user handling	2011-03-13 16:00:49 -07:00
Chris Mason	36e39c40b3	Btrfs: break out of shrink_delalloc earlier Josef had changed shrink_delalloc to exit after three shrink attempts, which wasn't quite enough because new writers could race in and steal free space. But it also fixed deadlocks and stalls as we tried to recover delalloc reservations. The code was tweaked to loop 1024 times, and would reset the counter any time a small amount of progress was made. This was too drastic, and with a lot of writers we can end up stuck in shrink_delalloc forever. The shrink_delalloc loop is fairly complex because the caller is looping too, and the caller will go ahead and force a transaction commit to make sure we reclaim space. This reworks things to exit shrink_delalloc when we've forced some writeback and the delalloc reservations have gone down. This means the writeback has not just started but has also finished at least some of the metadata changes required to reclaim delalloc space. If we've got this wrong, we're returning ENOSPC too early, which is a big improvement over the current behavior of hanging the machine. Test 224 in xfstests hammers on this nicely, and with 1000 writers trying to fill a 1GB drive we get our first ENOSPC at 93% full. The other writers are able to continue until we get 100%. This is a worst case test for btrfs because the 1000 writers are doing small IO, and the small FS size means we don't have a lot of room for metadata chunks. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-12 07:08:42 -05:00
Alex Elder	0c9ba97318	xfs: don't name variables "panic" The new xfs_alert_tag() used a variable named "panic", and that is to be avoided. Rename it. Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-03-11 16:34:51 -06:00
Rob Landley	c5cb09b6f8	Cleanup: Factor out some cut-and-paste code. Factor out some cut-and-paste code in options parsing. Saves about 800 bytes on x86-64. Signed-off-by: Rob Landley <rlandley@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:28 -05:00
Rob Landley	c12bacec45	cleanup: save 60 lines/100 bytes by combining two mostly duplicate functions. Eliminate two mostly duplicate functions (nfs_parse_simple_hostname() and nfs_parse_protected_hostname()) and instead just make the calling function (nfs_parse_devname()) do everything. Signed-off-by: Rob Landley <rlandley@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:28 -05:00
Konstantin Khlebnikov	7ec10f26e1	NFS: account direct-io into task io accounting Account NFS direct-io reads and writes into Task I/O Accounting. Do it before complition to handle aio. NFS have unusual direct-io implementation, thus accounting in generic code does not work. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:27 -05:00
Trond Myklebust	b064eca2cf	NFSv4: Send unmapped uid/gids to the server when using auth_sys The new behaviour is enabled using the new module parameter 'nfs4_disable_idmapping'. Note that if the server rejects an unmapped uid or gid, then the client will automatically switch back to using the idmapper. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:27 -05:00
Trond Myklebust	3ddeb7c5c6	NFSv4: Propagate the error NFS4ERR_BADOWNER to nfs4_do_setattr This will be required in order to switch uid/gid mapping back on if the admin has tried to disable it. Note that we also propagate NFS4ERR_BADNAME at the same time, in order to work around a Linux server bug. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:27 -05:00
Trond Myklebust	e4fd72a17d	NFSv4: cleanup idmapper functions to take an nfs_server argument ...instead of the nfs_client. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:26 -05:00
Trond Myklebust	f0b851689a	NFSv4: Send unmapped uid/gids to the server if the idmapper fails Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:26 -05:00
Trond Myklebust	5cf36cfdc8	NFSv4: If the server sends us a numeric uid/gid then accept it Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:26 -05:00
Benny Halevy	75247affd7	NFSv4.1: reject zero layout with zeroed stripe unit Allowing stripe_unit==0 causes the client to crash later on when dividing by zero. Reported-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:45 -05:00
Fred Isaman	36fe432d33	NFSv4.1: Clear lseg pointer in ->doio function Now that we have access to the pointer, clear it immediately after the put, instead of in caller. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:45 -05:00
Fred Isaman	c76069bda0	NFSv4.1: rearrange ->doio args This will make it possible to clear the lseg pointer in the same function as it is put, instead of in the caller nfs_pageio_doio(). Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Fred Isaman	a69aef1496	NFSv4.1: pnfs filelayout driver write Allows the pnfs filelayout driver to write to the data servers. Note that COMMIT to data servers will be implemented in a future patch. To avoid improper behavior, for the moment any WRITE to a data server that would also require a COMMIT to the data server is sent NFS_FILE_SYNC. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Fred Isaman	7ffd10640d	NFSv4.1: remove GETATTR from ds writes Any WRITE compound directed to a data server needs to have the GETATTR calls suppressed. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Andy Adamson	0382b74409	NFSv4.1: implement generic pnfs layer write switch Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Fred Isaman	44b83799a9	NFSv4.1: trigger LAYOUTGET for writes Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Fred Isaman	5053aa568d	NFSv4.1: Send lseg down into nfs_write_rpcsetup We grab the lseg sent in from the doio function and attach it to each struct nfs_write_data created. This is how the lseg will be sent to the layout driver. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Fred Isaman	b029bc9b08	NFSv4.1: add callback to nfs4_write_done Add callback that pnfs layout driver can use to do its own handling of data server WRITE response. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Andy Adamson	d138d5d17b	NFSv4.1: rearrange nfs_write_rpcsetup Reorder nfs_write_rpcsetup, preparing for a pnfs entry point. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Andy Adamson	568e8c494d	NFSv4.1: turn off pNFS on ds connection failure If a data server is unavailable, go through MDS. Mark the deviceid containing the data server as a negative cache entry. Do not try to connect to any data server on a deviceid marked as a negative cache entry. Mark any layout that tries to use the marked deviceid as failed. Inodes with a layout marked as fails will not use the layout for I/O, and will not perform any more layoutgets. Inodes without a layout will still do layoutget, but the layout will get marked immediately. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Christoph Hellwig	ea8eecdd11	NFSv4.1 move deviceid cache to filelayout driver No need for generic cache with only one user. Keep a simple hash of deviceids in the filelayout driver. Signed-off-by: Christoph Hellwig <hch@infradead.org> Acked-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Andy Adamson	cbdabc7f8b	NFSv4.1: filelayout async error handler Use our own async error handler. Mark the layout as failed and retry i/o through the MDS on specified errors. Update the mds_offset in nfs_readpage_retry so that a failed short-read retry to a DS gets correctly resent through the MDS. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Andy Adamson	dc70d7b318	NFSv4.1: filelayout read Attempt a pNFS file layout read by setting up the nfs_read_data struct and calling nfs_initiate_read with the data server rpc client and the filelayout rpc call ops. Error handling is implemented in a subsequent patch. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Tested-by: Guo Mingyang <guomingyang@nrchpc.ac.cn> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Fred Isaman	cfe7f4120f	NFSv4.1: filelayout i/o helpers Prepare for filelayout_read_pagelist with helper functions that find the correct data server, filehandle, and offset. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Tigran Mkrtchyan <tigran@anahit.desy.de> Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Andy Adamson	d83217c135	NFSv4.1: data server connection Introduce a data server set_client and init session following the nfs4_set_client and nfs4_init_session convention. Once a new nfs_client is on the nfs_client_list, the nfs_client cl_cons_state serializes access to creating an nfs_client struct with matching properties. Use the new nfs_get_client() that initializes new clients. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Andy Adamson	64419a9b20	NFSv4.1: generic read Separate the rpc run portion of nfs_read_rpcsetup into a new function nfs_initiate_read that is called for normal NFS I/O. Add a pNFS read_pagelist function that is called instead of nfs_intitate_read for pNFS reads. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Fred Isaman	bae724ef95	NFSv4.1: shift pnfs_update_layout locations Move the pnfs_update_layout call location to nfs_pageio_do_add_request(). Grab the lseg sent in the doio function to nfs_read_rpcsetup and attach it to each nfs_read_data so it can be sent to the layout driver. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Fred Isaman	94ad1c80e2	NFSv4.1: coelesce across layout stripes Add a pg_test layout driver hook which is used to avoid coelescing I/O across layout stripes. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Fred Isaman	d684d2ae10	NFSv4.1: lseg refcounting Prepare put_lseg and get_lseg to be called from the pNFS I/O code. Pull common code from pnfs_lseg_locked to call from pnfs_lseg. Inline pnfs_lseg_locked into it's only caller. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Andy Adamson	94de8b27d0	NFSv4.1: add MDS mount DS only check The DS only role cannot be used to mount. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Andy Adamson	d6fb79d433	NFSv4.1: new flag for lease time check Data servers cannot send nfs4_proc_get_lease_time. but still need to setup state renewal. Add the NFS_CS_CHECK_LEASE_TIME bit to indicate if the lease time can be checked. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Andy Adamson	d3b4c9d767	NFSv4.1: new flag for state renewal check Data servers not sharing a session with the mount MDS always have an empty cl_superblocks list. Replace the cl_superblocks empty list check to see if it is time to shut down renewd with the NFS_CS_STOP_RENEW bit which is not set by such a data server. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Andy Adamson	89d1ea6579	NFSv4.1: send zero stateid seqid on v4.1 i/o Data servers require a zero stateid seqid, and there is no advantage to not doing the same for all NFSv4.1 Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Andy Adamson	45a52a0207	NFS move nfs_client initialization into nfs_get_client Now nfs_get_client returns an nfs_client ready to be used no matter if it was found or created. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Andy Adamson	bf9c1387ca	NFSv4.1: put_layout_hdr can remove nfsi->layout Prevents an Oops triggered by CB_LAYOUTRECALL and LAYOUTGET race on a pnfs_layout_hdr first pnfs_layout_segment. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Fred Isaman	136028967a	NFS: change nfs_writeback_done to return void The return values are not used by any callers. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:40 -05:00
Fred Isaman	83762c56c1	NFS: remove pointless if statement in nfs_direct_write_result The code was doing nothing more in either branch of the if. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:40 -05:00
Fred Isaman	f49f9baac8	pnfs: fix pnfs lock inversion of i_lock and cl_lock The pnfs code was using throughout the lock order i_lock, cl_lock. This conflicts with the nfs delegation code. Rework the pnfs code to avoid taking both locks simultaneously. Currently the code takes the double lock to add/remove the layout to a nfs_client list, while atomically checking that the list of lsegs is empty. To avoid this, we rely on existing serializations. When a layout is initialized with lseg count equal zero, LAYOUTGET's openstateid serialization is in effect, making it safe to assume it stays zero unless we change it. And once a layout's lseg count drops to zero, it is set as DESTROYED and so will stay at zero. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:40 -05:00
Fred Isaman	9f52c2525e	pnfs: do not need to clear NFS_LAYOUT_BULK_RECALL flag We do not need to clear the NFS_LAYOUT_BULK_RECALL, as setting it guarantees that NFS_LAYOUT_DESTROYED will be set once any outstanding io is finished. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:40 -05:00
Fred Isaman	3851172244	pnfs: avoid incorrect use of layout stateid The code could violate the following from RFC5661, section 12.5.3: "Once a client has no more layouts on a file, the layout stateid is no longer valid and MUST NOT be used." This can occur when a layout already has a lseg, starts another non-everlapping LAYOUTGET, and a CB_LAYOUTRECALL for the existing lseg is processed before we hit pnfs_layout_process(). Solve by setting, each time the client has no more lsegs for a file, a flag which blocks further use of the layout and triggers its removal. This also fixes a second bug which occurs in the same instance as above. If we actually use pnfs_layout_process, we add the new lseg to the layout, but the layout has been removed from the nfs_client list by the intervening CB_LAYOUTRECALL and will not be added back. Thus the newly acquired lseg will not be properly returned in the event of a subsequent CB_LAYOUTRECALL. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:39 -05:00
Chuck Lever	53d4737580	NFS: NFSROOT should default to "proto=udp" There have been a number of recent reports that NFSROOT is no longer working with default mount options, but fails only with certain NICs. Brian Downing <bdowning@lavos.net> bisected to commit `56463e50` "NFS: Use super.c for NFSROOT mount option parsing". Among other things, this commit changes the default mount options for NFSROOT to use TCP instead of UDP as the underlying transport. TCP seems less able to deal with NICs that are slow to initialize. The system logs that have accompanied reports of problems all show that NFSROOT attempts to establish a TCP connection before the NIC is fully initialized, and thus the TCP connection attempt fails. When a TCP connection attempt fails during a mount operation, the NFS stack needs to fail the operation. Usually user space knows how and when to retry it. The network layer does not report a distinct error code for this particular failure mode. Thus, there isn't a clean way for the RPC client to see that it needs to retry in this case, but not in others. Because NFSROOT is used in some environments where it is not possible to update the kernel command line to specify "udp", the proper thing to do is change NFSROOT to use UDP by default, as it did before commit `56463e50`. To make it easier to see how to change default mount options for NFSROOT and to distinguish default settings from mandatory settings, I've adjusted a couple of areas to document the specifics. root_nfs_cat() is also modified to deal with commas properly when concatenating strings containing mount option lists. This keeps root_nfs_cat() call sites simpler, now that we may be concatenating multiple mount option strings. Tested-by: Brian Downing <bdowning@lavos.net> Tested-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@kernel.org> # 2.6.37 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:07 -05:00
Huang Weiyi	57df216bd8	nfs4: remove duplicated #include Remove duplicated #include('s) in fs/nfs/nfs4proc.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:37 -05:00
Trond Myklebust	f9feab1e18	NFSv4: nfs4_state_mark_reclaim_nograce() should be static There are no more external users of nfs4_state_mark_reclaim_nograce() or nfs4_state_mark_reclaim_reboot(), so mark them as static. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:36 -05:00
Trond Myklebust	ecac799a5e	NFSv4: Fix the setlk error handler Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:36 -05:00
Trond Myklebust	b4410c2f7f	NFSv4.1: Fix the handling of the SEQUENCE status bits We want SEQUENCE status bits to be handled by the state manager in order to avoid threading issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:35 -05:00
Trond Myklebust	0400a6b0cb	NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses nfs4_schedule_state_recovery() should only be used when we need to force the state manager to check the lease. If we just want to start the state manager in order to handle a state recovery situation, we should be using nfs4_schedule_state_manager(). This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing its use with a set of helper functions that do the right thing. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:22 -05:00
Dave Chinner	d6a079e82e	GFS2: introduce AIL lock The log lock is currently used to protect the AIL lists and the movements of buffers into and out of them. The lists are self contained and no log specific items outside the lists are accessed when starting or emptying the AIL lists. Hence the operation of the AIL does not require the protection of the log lock so split them out into a new AIL specific lock to reduce the amount of traffic on the log lock. This will also reduce the amount of serialisation that occurs when the gfs2_logd pushes on the AIL to move it forward. This reduces the impact of log pushing on sequential write throughput. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-11 11:52:25 +00:00
Benjamin Marzinski	e4a7b7b0c9	GFS2: fix block allocation check for fallocate GFS2 fallocate wasn't properly checking if a blocks were already allocated. In write_empty_blocks(), if a page didn't have buffer_heads attached, GFS2 was always treating it as if there were no blocks allocated for that page. GFS2 now calls gfs2_block_map() to check if the blocks are allocated before writing them out. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-11 09:26:48 +00:00
Bob Peterson	fa1bbdea30	GFS2: Optimize glock multiple-dequeue code This is a small patch that optimizes multiple glock dequeue operations. It changes the unlock order to be more efficient and makes it easier for lock debugging tools to unravel. It also eliminates the need for the temp variable x, although that would likely be optimized out. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-11 09:24:54 +00:00
Artem Bityutskiy	2bcf002159	UBIFS: do not check data crc by default Change the default UBIFS behavior WRT data CRC checking. Currently, UBIFS checks data CRC when reading, which slows it down quite a bit, and this is the default option. However, it looks like in average user does not need this feature and would prefer faster read speed over extra reliability. And this seems to be de-facto standard that file-systems do not check data CRC every time they read from the media. Thus, make UBIFS default behavior so that it does not check data CRC. This corresponds to the no_chk_data_crc mount option. Those users who need extra protection can always enable it using the chk_data_crc option. Please, read more information about this feature here: http://www.linux-mtd.infradead.org/doc/ubifs.html#L_checksumming Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-11 10:52:07 +02:00
Artem Bityutskiy	cce3f612fe	UBIFS: simplify UBIFS Kconfig menu Remove debug message level and debug checks Kconfig options as they proved to be useless anyway. We have sysfs interface which we can use for fine-grained debugging messages and checks selection, see Documentation/filesystems/ubifs.txt for mode details. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-11 10:52:07 +02:00
Artem Bityutskiy	6342aaebda	UBIFS: print max. index node size Improve debugging messages by printing the maximum index node size on mount. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-11 10:52:07 +02:00
Matthew L. Creech	d882962f6a	UBIFS: handle allocation failures in UBIFS write path Running kernel 2.6.37, my PPC-based device occasionally gets an order-2 allocation failure in UBIFS, which causes the root FS to become unwritable: kswapd0: page allocation failure. order:2, mode:0x4050 Call Trace: [c787dc30] [c00085b8] show_stack+0x7c/0x194 (unreliable) [c787dc70] [c0061aec] __alloc_pages_nodemask+0x4f0/0x57c [c787dd00] [c0061b98] __get_free_pages+0x20/0x50 [c787dd10] [c00e4f88] ubifs_jnl_write_data+0x54/0x200 [c787dd50] [c00e82d4] do_writepage+0x94/0x198 [c787dd90] [c00675e4] shrink_page_list+0x40c/0x77c [c787de40] [c0067de0] shrink_inactive_list+0x1e0/0x370 [c787de90] [c0068224] shrink_zone+0x2b4/0x2b8 [c787df00] [c0068854] kswapd+0x408/0x5d4 [c787dfb0] [c0037bcc] kthread+0x80/0x84 [c787dff0] [c000ef44] kernel_thread+0x4c/0x68 Similar problems were encountered last April by Tomasz Stanislawski: http://patchwork.ozlabs.org/patch/50965/ This patch implements Artem's suggested fix: fall back to a mutex-protected static buffer, allocated at mount time. I tested it by forcing execution down the failure path, and didn't see any ill effects. Artem: massaged the patch a little, improved it so that we'd not allocate the write reserve buffer when we are in R/O mode. Signed-off-by: Matthew L. Creech <mlcreech@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-11 10:52:07 +02:00
Andy Adamson	c34c32ea97	NFSv4.1 reclaim complete must wait for completion Signed-off-by: Andy Adamson <andros@netapp.com> [Trond: fix whitespace errors] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:05:01 -05:00
Andy Adamson	114f64b5f2	NFSv4: remove duplicate clientid in struct nfs_client Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:05:00 -05:00
Ricardo Labiaga	7d6d63d642	NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY Fix bug where we currently retry the EXCHANGEID call again, eventhough we already have a valid clientid. Instead, delay and retry the CREATE_SESSION call. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:59 -05:00
Frank Filz	3fa0b4e201	(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid The problem was use of an int32, which when converted to a uint64 is sign extended resulting in a fileid that doesn't fit in 32 bits even though the intent of the function is to fit the fileid into 32 bits. Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> [Trond: Added an include for compat.h] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:58 -05:00
Jovi Zhang	43b7c3f051	nfs: fix compilation warning this commit fix compilation warning as following: linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:56 -05:00
Stanislav Fomichev	b9f810570d	nfs: add kmalloc return value check in decode_and_add_ds add kmalloc return value check in decode_and_add_ds Signed-off-by: Stanislav Fomichev <kernel@fomichev.me> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:55 -05:00
Jeff Layton	d2224e7afb	nfs: close NFSv4 COMMIT vs. CLOSE race I've been adding in more artificial delays in the NFSv4 commit and close codepaths to uncover races. The kernel I'm testing has the patch to close the race in __rpc_wait_for_completion_task that's in Trond's cthon2011 branch. The reproducer I've been using does this in a loop: mkdir("DIR"); fd = open("DIR/FILE", O_WRONLY\|O_CREAT\|O_EXCL, 0644); write(fd, "abcdefg", 7); close(fd); unlink("DIR/FILE"); rmdir("DIR"); The above reproducer shouldn't result in any silly-renaming. However, when I add a "msleep(100)" just after the nfs_commit_clear_lock call in nfs_commit_release, I can almost always force one to occur. If I can force it to occur with that, then it can happen without that delay given the right timing. nfs_commit_inode waits for the NFS_INO_COMMIT bit to clear when called with FLUSH_SYNC set. nfs_commit_rpcsetup on the other hand does not wait for the task to complete before putting its reference to it, so the last reference get put in rpc_release task and gets queued to a workqueue. In this situation, the last open context reference may be put by the COMMIT release instead of the close() syscall. The close() syscall returns too quickly and the unlink runs while the d_count is still high since the COMMIT release hasn't put its dentry reference yet. Fix this by having rpc_commit_rpcsetup wait for the RPC call to complete before putting the task reference when FLUSH_SYNC is set. With this, the last reference is put by the process that's initiating the FLUSH_SYNC commit and the race is closed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:53 -05:00
Trond Myklebust	bf294b41ce	SUNRPC: Close a race in __rpc_wait_for_completion_task() Although they run as rpciod background tasks, under normal operation (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck() and nfs4_do_close() want to be fully synchronous. This means that when we exit, we want all references to the rpc_task to be gone, and we want any dentry references etc. held by that task to be released. For this reason these functions call __rpc_wait_for_completion_task(), followed by rpc_put_task() in the expectation that the latter will be releasing the last reference to the rpc_task, and thus ensuring that the callback_ops->rpc_release() has been called synchronously. This patch fixes a race which exists due to the fact that rpciod calls rpc_complete_task() (in order to wake up the callers of __rpc_wait_for_completion_task()) and then subsequently calls rpc_put_task() without ensuring that these two steps are done atomically. In order to avoid adding new spin locks, the patch uses the existing waitqueue spin lock to order the rpc_task reference count releases between the waiting process and rpciod. The common case where nobody is waiting for completion is optimised for by checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task reference count is 1: in those cases we drop trying to grab the spin lock, and immediately free up the rpc_task. Those few processes that need to put the rpc_task from inside an asynchronous context and that do not care about ordering are given a new helper: rpc_put_task_async(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:52 -05:00
David Teigland	e43f055a95	dlm: use alloc_workqueue function Replaces deprecated create_singlethread_workqueue(). Signed-off-by: David Teigland <teigland@redhat.com>	2011-03-10 13:22:34 -06:00
David Teigland	e3853a90e2	dlm: increase default hash table sizes Make all three hash tables a consistent size of 1024 rather than 1024, 512, 256. All three tables, for resources, locks, and lock dir entries, will generally be filled to the same order of magnitude. Signed-off-by: David Teigland <teigland@redhat.com>	2011-03-10 13:08:22 -06:00
David Teigland	8304d6f24c	dlm: record full callback state Change how callbacks are recorded for locks. Previously, information about multiple callbacks was combined into a couple of variables that indicated what the end result should be. In some situations, we could not tell from this combined state what the exact sequence of callbacks were, and would end up either delivering the callbacks in the wrong order, or suppress redundant callbacks incorrectly. This new approach records all the data for each callback, leaving no uncertainty about what needs to be delivered. Signed-off-by: David Teigland <teigland@redhat.com>	2011-03-10 10:40:00 -06:00
Miao Xie	7e6b6465e6	btrfs: fix not enough reserved space btrfs_link() will insert 3 items(inode ref, dir name item and dir index item) into the b+ tree and update 2 items(its inode, and parent's inode) in the b+ tree. So we should reserve space for these 5 items, not 3 items. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-10 11:21:49 -05:00
Daniel J Blueman	b4966b7770	btrfs: fix dip leak The btrfs DIO code leaks dip structs when dip->csums allocation fails; bio->bi_end_io isn't set at the point where the free_ordered branch is consequently taken, thus bio_endio doesn't call the function which would free it in the normal case. Fix. Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Acked-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-10 11:21:49 -05:00
J. Bruce Fields	d891eedbc3	fs/dcache: allow d_obtain_alias() to return unhashed dentries Without this patch, inodes are not promptly freed on last close of an unlinked file by an nfs client: client$ mount -tnfs4 server:/export/ /mnt/ client$ tail -f /mnt/FOO ... server$ df -i /export server$ rm /export/FOO (^C the tail -f) server$ df -i /export server$ echo 2 >/proc/sys/vm/drop_caches server$ df -i /export the df's will show that the inode is not freed on the filesystem until the last step, when it could have been freed after killing the client's tail -f. On-disk data won't be deallocated either, leading to possible spurious ENOSPC. This occurs because when the client does the close, it arrives in a compound with a putfh and a close, processed like: - putfh: look up the filehandle. The only alias found for the inode will be DCACHE_UNHASHED alias referenced by the filp this, so it creates a new DCACHE_DISCONECTED dentry and returns that instead. - close: closes the existing filp, which is destroyed immediately by dput() since it's DCACHE_UNHASHED. - end of the compound: release the reference to the current filehandle, and dput() the new DCACHE_DISCONECTED dentry, which gets put on the unused list instead of being destroyed immediately. Nick Piggin suggested fixing this by allowing d_obtain_alias to return the unhashed dentry that is referenced by the filp, instead of making it create a new dentry. Leave __d_find_alias() alone to avoid changing behavior of other callers. Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, hashed or unhashed, disconnected or not, should work. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 05:18:54 -05:00
Marco Stornelli	1ca551c6ca	Check for immutable/append flag in fallocate path In the fallocate path the kernel doesn't check for the immutable/append flag. It's possible to have a race condition in this scenario: an application open a file in read/write and it does something, meanwhile root set the immutable flag on the file, the application at that point can call fallocate with success. In addition, we don't allow to do any unreserve operation on an append only file but only the reserve one. Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 04:22:15 -05:00
Al Viro	9177ada99d	fat: fix d_revalidate oopsen on NFS exports can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:45:49 -05:00
Al Viro	8ce84eeb5b	jfs: fix d_revalidate oopsen on NFS exports can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:45:28 -05:00
Al Viro	4714e63731	ocfs2: fix d_revalidate oopsen on NFS exports can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:45:07 -05:00
Al Viro	53fe924161	gfs2: fix d_revalidate oopsen on NFS exports can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:44:48 -05:00
Al Viro	529c5f958f	fuse: fix d_revalidate oopsen on NFS exports can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:44:31 -05:00
Al Viro	0eb980e317	ceph: fix d_revalidate oopsen on NFS exports can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:44:05 -05:00
Al Viro	c78f4cc5e7	reiserfs xattr ->d_revalidate() shouldn't care about RCU ... it returns an error unconditionally Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:42:01 -05:00
Al Viro	ae50adcb0a	/proc/self is never going to be invalidated... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:41:53 -05:00
Linus Torvalds	3979491701	Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux * 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: nfsd: wrong index used in inner loop nfsd4: fix bad pointer on failure to find delegation NFSD: fix decode_cb_sequence4resok	2011-03-09 14:52:09 -08:00
Linus Torvalds	78833dd706	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: nd->inode is not set on the second attempt in path_walk() unfuck proc_sysctl ->d_compare() minimal fix for do_filp_open() race	2011-03-09 13:55:51 -08:00
Christoph Hellwig	ecb6928fcf	xfs: factor agf counter updates into a helper Updating the AGF and transactions counters is duplicated between allocating and freeing extents. Factor the code into a common helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-03-09 08:23:47 -06:00
Christoph Hellwig	86fa8af69d	xfs: clean up the xfs_alloc_compute_aligned calling convention Pass a xfs_alloc_arg structure to xfs_alloc_compute_aligned and derive the alignment and minlen paramters from it. This cleans up the existing callers, and we'll need even more information from the xfs_alloc_arg in subsequent patches. Based on a patch from Dave Chinner. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-03-09 08:23:33 -06:00
Steven Whitehouse	0a33443b38	GFS2: Remove potential race in flock code This patch ensures that we always wait for glock demotion when dropping flocks on a file in order to prevent any race conditions associated with further flock calls or closing the file. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-09 11:14:32 +00:00
Steven Whitehouse	fc0e38dae6	GFS2: Fix glock deallocation race This patch fixes a race in deallocating glocks which was introduced in the RCU glock patch. We need to ensure that the glock count is kept correct even in the case that there is a race to add a new glock into the hash table. Also, to avoid having to wait for an RCU grace period, the glock counter can be decremented before call_rcu() is called. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-09 10:58:04 +00:00
Abhijith Das	662e3a551b	GFS2: quota allows exceeding hard limit Immediately after being synced to disk, cached quotas are zeroed out and a subsequent access of the cached quotas results in incorrect zero values. This meant that gfs2 assumed the actual usage to be the zero (or near-zero) usage values it found in the cached quotas and comparison against warn/limits never triggered a quota violation. This patch adds a new flag QDF_REFRESH that is set after a sync so that the cached quotas are forcefully refreshed from disk on a subsequent access on seeing this flag set. Resolves: rhbz#675944 Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-09 09:32:44 +00:00
Ryusuke Konishi	e3154e9748	nilfs2: get rid of nilfs_sb_info structure This directly uses sb->s_fs_info to keep a nilfs filesystem object and fully removes the intermediate nilfs_sb_info structure. With this change, the hierarchy of on-memory structures of nilfs will be simplified as follows: Before: super_block -> nilfs_sb_info -> the_nilfs -> cptree --+-> nilfs_root (current file system) +-> nilfs_root (snapshot A) +-> nilfs_root (snapshot B) : -> nilfs_sc_info (log writer structure) After: super_block -> the_nilfs -> cptree --+-> nilfs_root (current file system) +-> nilfs_root (snapshot A) +-> nilfs_root (snapshot B) : -> nilfs_sc_info (log writer structure) The reason why we didn't design so from the beginning is because the initial shape also differed from the above. The early hierachy was composed of "per-mount-point" super_block -> nilfs_sb_info pairs and a shared nilfs object. On the kernel 2.6.37, it was changed to the current shape in order to unify super block instances into one per device, and this cleanup became applicable as the result. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:54:26 +09:00
Al Viro	b306419ae0	nd->inode is not set on the second attempt in path_walk() We leave it at whatever it had been pointing to after the first link_path_walk() had failed with -ESTALE. Things do not work well after that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-08 21:16:28 -05:00
Ryusuke Konishi	f7545144c2	nilfs2: use sb instance instead of nilfs_sb_info struct This replaces sbi uses with direct reference to sb instance. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:05:08 +09:00
Ryusuke Konishi	d96bbfa28a	nilfs2: get rid of sc_sbi back pointer Removes sci->sc_sbi which is a back pointer to nilfs_sb_info struct from log writer object (nilfs_sc_info). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:05:08 +09:00
Ryusuke Konishi	3fd3fe5aea	nilfs2: move log writer onto nilfs object Log writer is held by the nilfs_sb_info structure. This moves it into nilfs object and replaces all uses of NILFS_SC() accessor. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:05:08 +09:00
Ryusuke Konishi	9b1fc4e497	nilfs2: move next generation counter into nilfs object Moves s_next_generation counter and a spinlock protecting it to nilfs object from nilfs_sb_info structure. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:05:08 +09:00
Ryusuke Konishi	693dd32122	nilfs2: move s_inode_lock and s_dirty_files into nilfs object Moves s_inode_lock spinlock and s_dirty_files list to nilfs object from nilfs_sb_info structure. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:05:07 +09:00
Ryusuke Konishi	574e6c3145	nilfs2: move parameters on nilfs_sb_info into nilfs object This moves four parameter variables on nilfs_sb_info s_resuid, s_resgid, s_interval and s_watermark to the nilfs object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:05:07 +09:00
Ryusuke Konishi	3b2ce58b0f	nilfs2: move mount options to nilfs object This moves mount_opt local variable to nilfs object from nilfs_sb_info struct. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-09 11:05:07 +09:00
roel	3ec07aa952	nfsd: wrong index used in inner loop Index i was already used in the outer loop Cc: stable@kernel.org Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-08 19:46:10 -05:00
Chris Mason	ea8efc74bd	Btrfs: make sure not to return overlapping extents to fiemap The btrfs fiemap code was incorrectly returning duplicate or overlapping extents in some cases. cp was blindly trusting this result and we would end up with a destination file that was bigger than the original because some bytes were copied twice. The fix here adjusts our offsets to make sure we're always moving forward in the fiemap results. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-08 11:58:09 -05:00
Artem Bityutskiy	2765df7da5	UBIFS: use max_write_size during recovery When recovering from unclean reboots UBIFS scans the journal and checks nodes. If a corrupted node is found, UBIFS tries to check if this is the last node in the LEB or not. This is is done by checking if there only 0xFF bytes starting from the next min. I/O unit. However, since now we write in c->max_write_size, we should actually check for 0xFFs starting from the next max. write unit. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-08 10:12:49 +02:00
Artem Bityutskiy	6c7f74f703	UBIFS: use max_write_size for write-buffers Switch write-buffers from 'c->min_io_size' to 'c->max_write_size' which presumably has to be more write speed-efficient. However, when write-buffer is synchronized, write only the the min. I/O units which contain the data, do not write whole write-buffer. This is more space-efficient. Additionally, this patch takes into account that the LEB might not start from the max. write unit-aligned address. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-08 10:12:49 +02:00
Artem Bityutskiy	3c89f396dc	UBIFS: introduce write-buffer size field Currently we assume write-buffer size is always min_io_size. But this is about to change and write-buffers may be of variable size. Namely, they will be of max_write_size at the beginning, but will get smaller when we are approaching the end of LEB. This is a preparation patch which introduces 'size' field in the write-buffer structure which carries the current write-buffer size. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-08 10:12:49 +02:00
Artem Bityutskiy	ca2ec61d15	UBI: incorporate LEB offset information Incorporate the LEB offset information into UBIFS. We'll use this information in one of the next patches to figure out what are the max. write size offsets relative to the PEB. So this patch is just a preparation. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-08 10:12:48 +02:00
Artem Bityutskiy	3e8e2e0c8d	UBIFS: incorporate maximum write size Incorporate maximum write size into the UBIFS description data structure. This patch just introduces new 'c->max_write_size' and 'c->max_write_shift' fields as a preparation for the following patches. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-03-08 10:12:48 +02:00
Al Viro	dfef6dcd35	unfuck proc_sysctl ->d_compare() a) struct inode is not going to be freed under ->d_compare(); however, the thing PROC_I(inode)->sysctl points to just might. Fortunately, it's enough to make freeing that sucker delayed, provided that we don't step on its ->unregistering, clear the pointer to it in PROC_I(inode) before dropping the reference and check if it's NULL in ->d_compare(). b) I'm not sure that we can walk into NULL inode here (we recheck dentry->seq between verifying that it's still hashed / fetching dentry->d_inode and passing it to ->d_compare() and there's no negative hashed dentries in /proc/sys/*), but if we can walk into that, we really should not have ->d_compare() return 0 on it! Said that, I really suspect that this check can be simply killed. Nick? Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-08 02:22:27 -05:00
Ryusuke Konishi	be667377a8	nilfs2: record used amount of each checkpoint in checkpoint list This records the number of used blocks per checkpoint in each checkpoint entry of cpfile. Even though userland tools can get the block count via nilfs_get_cpinfo ioctl, it was not updated by the nilfs2 kernel code. This fixes the issue and makes it available for userland tools to calculate used amount per checkpoint. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Jiro SEKIBA <jir@unicus.jp>	2011-03-08 14:58:31 +09:00
Ryusuke Konishi	ae191838b0	nilfs2: optimize rec_len functions This is a similar change to those in ext2/ext3 codebase (commit `40a063f669` and `a4ae309486`, respectively). The addition of 64k block capability in the rec_len_from_disk and rec_len_to_disk functions added a bit of math overhead which slows down file create workloads needlessly when the architecture cannot even support 64k blocks. This will cut the corner. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:30 +09:00
Ryusuke Konishi	4138ec2382	nilfs2: append blocksize info to warnings during loading super blocks At present, the same warning message can be output twice when nilfs detected a problem on super blocks: NILFS warning: broken superblock. using spare superblock. NILFS warning: broken superblock. using spare superblock. ... This is because these super blocks are reloaded with the block size written in a super block if it differs from the first block size, but this repetition looks somewhat confusing. So, we hint at what is going on by appending block size information to those messages. Reported-by: Wakko Warner <wakko@animx.eu.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:30 +09:00
Ryusuke Konishi	828b1c50ae	nilfs2: add compat ioctl The current FS_IOC_GETFLAGS/SETFLAGS/GETVERSION will fail if application is 32 bit and kernel is 64 bit. This issue is avoidable by adding compat_ioctl method. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:30 +09:00
Ryusuke Konishi	cde98f0f84	nilfs2: implement FS_IOC_GETFLAGS/SETFLAGS/GETVERSION Add support for the standard attributes set via chattr and read via lsattr. These attributes are already in the flags value in the nilfs2 inode, but currently we don't have any ioctl commands that expose them to the userland. Collaterally, this adds the FS_IOC_GETVERSION ioctl for getting i_generation, which allows users to list the file's generation number with "lsattr -v". Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:30 +09:00
Ryusuke Konishi	b253a3e4f2	nilfs2: tighten restrictions on inode flags Nilfs has few rectrictions on which flags may be set on which inodes like ext2/3/4 filesystems used to be. Specifically DIRSYNC may only be set on directories and IMMUTABLE and APPEND may not be set on links. Tighten that to disallow TOPDIR being set on non-directories and only NODUMP and NOATIME to be set on non-regular file, non-directories. This introduces a flags masking function like those of extN and uses it during inode creation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:29 +09:00
Ryusuke Konishi	32f4aeb315	nilfs2: mark S_NOATIME on inodes only if NOATIME attribute is set At present, nilfs marks S_NOATIME flag on all inodes. This restricts nilfs_set_inode_flags function so that it marks S_NOATIME only if a given inode has an FS_NOATIME_FL flag. Although nilfs does not support atime yet, touch_atime() still safely returns on IS_NOATIME check since MS_NOATIME is always set on sb. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:29 +09:00
Ryusuke Konishi	f0c9f242f9	nilfs2: use common file attribute macros Replaces uses of own inode flags (i.e. NILFS_SECRM_FL, NILFS_UNRM_FL, NILFS_COMPR_FL, and so forth) with common inode flags, and removes the own flag declarations. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:29 +09:00
Ryusuke Konishi	9954e7af14	nilfs2: add free entries count only if clear bit operation succeeded Three functions of the current persistent object allocator, nilfs_palloc_commit_free_entry, nilfs_palloc_abort_alloc_entry, and nilfs_palloc_freev functions unconditionally add a counter after doing clear bit operation on a bitmap block. If the clear bit operation overlapped, the counter will not add up. This fixes the issue by making the counter operations conditional. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:04 +09:00
Ryusuke Konishi	25b18d39cc	nilfs2: decrement inodes count only if raw inode was successfully deleted This fixes the issue that inodes count will not add up after removal of raw inodes fails. Hence, this prevents possible under flow of the inodes count. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-03-08 14:58:04 +09:00
James Morris	fe3fa43039	Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next	2011-03-08 11:38:10 +11:00
James Morris	1cc26bada9	Merge branch 'master'; commit 'v2.6.38-rc7' into next	2011-03-08 10:55:06 +11:00
J. Bruce Fields	32b007b4e1	nfsd4: fix bad pointer on failure to find delegation In case of a nonempty list, the return on error here is obviously bogus; it ends up being a pointer to the list head instead of to any valid delegation on the list. In particular, if nfsd4_delegreturn() hits this case, and you're quite unlucky, then renew_client may oops, and it may take an embarassingly long time to figure out why. Facepalm. BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 IP: [<ffffffff81292965>] nfsd4_delegreturn+0x125/0x200 ... Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-07 11:44:53 -05:00
Eric Sandeen	d7433142b6	ext3: Always set dx_node's fake_dirent explicitly. (crossport of `1f7bebb9e9` by Andreas Schlick <schlick@lavabit.com>) When ext3_dx_add_entry() has to split an index node, it has to ensure that name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck won't recognise it as an intermediate htree node and consider the htree to be corrupted. CC: stable@kernel.org Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-03-07 17:20:58 +01:00
Chris Mason	31339acd07	Btrfs: deal with short returns from copy_from_user When copy_from_user is only able to copy some of the bytes we requested, we may end up creating a partially up to date page. To avoid garbage in the page, we need to treat a partial copy as a zero length copy. This makes the rest of the file_write code drop the page and retry the whole copy instead of marking the partially up to date page as dirty. Signed-off-by: Chris Mason <chris.mason@oracle.com> cc: stable@kernel.org	2011-03-07 11:10:24 -05:00
Chris Mason	b1bf862e9d	Btrfs: fix regressions in copy_from_user handling Commit `914ee295af` fixed deadlocks in btrfs_file_write where we would catch page faults on pages we had locked. But, there were a few problems: 1) The x86-32 iov_iter_copy_from_user_atomic code always fails to copy data when the amount to copy is more than 4K and the offset to start copying from is not page aligned. The result was btrfs_file_write looping forever retrying the iov_iter_copy_from_user_atomic We deal with this by changing btrfs_file_write to drop down to single page copies when iov_iter_copy_from_user_atomic starts returning failure. 2) The btrfs_file_write code was leaking delalloc reservations when iov_iter_copy_from_user_atomic returned zero. The looping above would result in the entire filesystem running out of delalloc reservations and constantly trying to flush things to disk. 3) btrfs_file_write will lock down page cache pages, make sure any writeback is finished, do the copy_from_user and then release them. Before the loop runs we check the first and last pages in the write to see if they are only being partially modified. If the start or end of the write isn't aligned, we make sure the corresponding pages are up to date so that we don't introduce garbage into the file. With the copy_from_user changes, we're allowing the VM to reclaim the pages after a partial update from copy_from_user, but we're not making sure the page cache page is up to date when we loop around to resume the write. We deal with this by pushing the up to date checks down into the page prep code. This fits better with how the rest of file_write works. Signed-off-by: Chris Mason <chris.mason@oracle.com> Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org> cc: stable@kernel.org	2011-03-07 10:42:27 -05:00
Dave Chinner	9130090b5f	xfs: kill support/debug.[ch] The remaining functionality in debug.[ch] is effectively just assert handling, conditional debug definitions and hex dumping. The hex dumping and assert function can be moved into the new printk module, while the rest can be moved into top-level header files. This allows fs/xfs/support/debug.[ch] to be completely removed from the codebase. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:09:35 +11:00
Dave Chinner	0b932cccbd	xfs: Convert remaining cmn_err() callers to new API Once converted, kill the remainder of the cmn_err() interface. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:08:35 +11:00
Dave Chinner	8221112b43	xfs: convert the quota debug prints to new API Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:07:35 +11:00
Dave Chinner	6d4a8ecb34	xfs: rename xfs_cmn_err_fsblock_zero() The "cmn_err" part of the function name is no longer relevant. Rename the function to xfs_alert_fsblock_zero() to match the new logging API. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:06:35 +11:00
Dave Chinner	5348778699	xfs: convert xfs_fs_cmn_err to new error logging API Continue to clean up the error logging code by converting all the callers of xfs_fs_cmn_err() to the new API. Once done, remove the unused old API function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:05:35 +11:00
Dave Chinner	af34e09da4	xfs: kill xfs_fs_mount_cmn_err() macro The xfs_fs_mount_cmn_err() hides a simple check as to whether the mount path should output an error or not. Remove the macro and open code the check. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:04:35 +11:00
Dave Chinner	65333b4c3d	xfs: kill xfs_fs_repair_cmn_err() macro In certain cases of inode corruption, the xfs_fs_repair_cmn_err() macro is used to output an extra message in the corruption report. That extra message is "unmount and run xfs_repair", which really applies to any corruption report. Each case that this macro is called (except one) a following call to xfs_corruption_error() is made to optionally dump more information about the error. Hence, move the output of "run xfs_repair" to xfs_corruption_error() so that it is output on all corruption reports. Also, convert the callers of the repair macro that don't call xfs_corruption_error() to call it, hence provide consiѕtent error reporting for all cases where xfs_fs_repair_cmn_err() used to be called. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:03:35 +11:00
Dave Chinner	6a19d9393a	xfs: convert xfs_cmn_err to xfs_alert_tag Continue the conversion of the old cmn_err interface be converting all the conditional panic tag errors to xfs_alert_tag() and then removing xfs_cmn_err(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:02:35 +11:00
Dave Chinner	a0fa2b679e	xfs: Convert xlog_warn to new logging interface Convert the xfs log operations to use the new error logging interfaces. This removes the xlog_{warn,panic} wrappers and makes almost all errors emit the device they belong to instead of just refering to "XFS". Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:01:35 +11:00
Dave Chinner	4f10700a2e	xfs: Convert linux-2.6/ files to new logging interface Convert the files in fs/xfs/linux-2.6/ to use the new xfs_<level> logging format that replaces the old Irix inherited cmn_err() interfaces. While there, also convert naked printk calls to use the relevant xfs logging function to standardise output format. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:00:35 +11:00
Al Viro	31be83aeae	omfs: make readdir stop when filldir says so filldir returning an error does not mean "skip this entry, try the next one"... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Bob Copeland <me@bobcopeland.com>	2011-03-05 16:24:12 -05:00
Al Viro	d932805b3d	omfs: merge unlink() and rmdir(), close leak in rename() In case of directory-overwriting rename(), omfs forgot to mark the victim doomed, so omfs_evict_inode() didn't free it. We could fix that by calling omfs_rmdir() for directory victims instead of doing omfs_unlink(), but it's easier to merge omfs_unlink() and omfs_rmdir() instead. Note that we have no hardlinks here. It also makes the checks in omfs_rename() go away - they fold into what omfs_remove() does when it runs into a directory. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Bob Copeland <me@bobcopeland.com>	2011-03-05 16:24:01 -05:00
Al Viro	cdb26496db	omfs: stop playing silly buggers with omfs_unlink() in ->rename() Since omfs directories are hashes of inodes and name is part of inode, we have to remove inode from old directory before we can put it into new one / under new name. So instead of bump i_nlink call omfs_unlink, which does omfs_delete_entry() decrement i_nlink and mark parent dirty in case of success decrement i_nlink if omfs_unlink failed and hadn't done it itself let's just call omfs_delete_entry() and dirty the parent ourselves... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Bob Copeland <me@bobcopeland.com>	2011-03-05 16:23:39 -05:00
Al Viro	013e4f4a28	omfs: rename() needs to mark old_inode dirty after ctime update we do mark it dirty before, but it doesn't guarantee that we don't get preempted just before assignment to ->i_ctime, with inode getting written out before we get CPU back... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Bob Copeland <me@bobcopeland.com>	2011-03-05 16:20:30 -05:00
Linus Torvalds	fb62c00a6d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: no .snap inside of snapped namespace libceph: fix msgr standby handling libceph: fix msgr keepalive flag libceph: fix msgr backoff libceph: retry after authorization failure libceph: fix handling of short returns from get_user_pages ceph: do not clear I_COMPLETE from d_release ceph: do not set I_COMPLETE Revert "ceph: keep reference to parent inode on ceph_dentry"	2011-03-05 10:43:22 -08:00
Matt Fleming	ae7eb8979c	fs/locks.c: Remove stale FIXME left over from BKL conversion The comment is no longer true as (now that the BKL conversion is finished) a spinlock _is_ now used to protect file_lock_list, blocked_list and inode->i_flock. Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2011-03-05 10:55:59 +01:00
Neil Horman	e9e3d724e2	nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3) The "bad_page()" page allocator sanity check was reported recently (call chain as follows): bad_page+0x69/0x91 free_hot_cold_page+0x81/0x144 skb_release_data+0x5f/0x98 __kfree_skb+0x11/0x1a tcp_ack+0x6a3/0x1868 tcp_rcv_established+0x7a6/0x8b9 tcp_v4_do_rcv+0x2a/0x2fa tcp_v4_rcv+0x9a2/0x9f6 do_timer+0x2df/0x52c ip_local_deliver+0x19d/0x263 ip_rcv+0x539/0x57c netif_receive_skb+0x470/0x49f :virtio_net:virtnet_poll+0x46b/0x5c5 net_rx_action+0xac/0x1b3 __do_softirq+0x89/0x133 call_softirq+0x1c/0x28 do_softirq+0x2c/0x7d do_IRQ+0xec/0xf5 default_idle+0x0/0x50 ret_from_intr+0x0/0xa default_idle+0x29/0x50 cpu_idle+0x95/0xb8 start_kernel+0x220/0x225 _sinittext+0x22f/0x236 It occurs because an skb with a fraglist was freed from the tcp retransmit queue when it was acked, but a page on that fraglist had PG_Slab set (indicating it was allocated from the Slab allocator (which means the free path above can't safely free it via put_page. We tracked this back to an nfsv4 setacl operation, in which the nfs code attempted to fill convert the passed in buffer to an array of pages in __nfs4_proc_set_acl, which gets used by the skb->frags list in xs_sendpages. __nfs4_proc_set_acl just converts each page in the buffer to a page struct via virt_to_page, but the vfs allocates the buffer via kmalloc, meaning the PG_slab bit is set. We can't create a buffer with kmalloc and free it later in the tcp ack path with put_page, so we need to either: 1) ensure that when we create the list of pages, no page struct has PG_Slab set or 2) not use a page list to send this data Given that these buffers can be multiple pages and arbitrarily sized, I think (1) is the right way to go. I've written the below patch to allocate a page from the buddy allocator directly and copy the data over to it. This ensures that we have a put_page free-able page for every entry that winds up on an skb frag list, so it can be safely freed when the frame is acked. We do a put page on each entry after the rpc_call_sync call so as to drop our own reference count to the page, leaving only the ref count taken by tcp_sendpages. This way the data will be properly freed when the ack comes in Successfully tested by myself to solve the above oops. Note, as this is the result of a setacl operation that exceeded a page of data, I think this amounts to a local DOS triggerable by an uprivlidged user, so I'm CCing security on this as well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Trond Myklebust <Trond.Myklebust@netapp.com> CC: security@kernel.org CC: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-04 17:28:52 -08:00
Sage Weil	455cec0abf	ceph: no .snap inside of snapped namespace Otherwise you can do things like # mkdir .snap/foo # cd .snap/foo/.snap # ls <badness> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-04 12:25:09 -08:00
Al Viro	1858efd471	minimal fix for do_filp_open() race failure exits on the no-O_CREAT side of do_filp_open() merge with those of O_CREAT one; unfortunately, if do_path_lookup() returns -ESTALE, we'll get out_filp:, notice that we are about to return -ESTALE without having trying to create the sucker with LOOKUP_REVAL and jump right into the O_CREAT side of code. And proceed to try and create a file. Usually that'll fail with -ESTALE again, but we can race and get that attempt of pathname resolution to succeed. open() without O_CREAT really shouldn't end up creating files, races or not. The real fix is to rearchitect the whole do_filp_open(), but for now splitting the failure exits will do. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-04 13:14:21 -05:00
Linus Torvalds	8336026942	Merge branch 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: hfs: fix rename() over non-empty directory udf: fix i_nlink limit fix reiserfs mkdir() breakage exofs: i_nlink races in rename() nilfs2: i_nlink races in rename() minix: i_nlink races in rename() ufs: i_nlink races in rename() sysv: i_nlink races in rename()	2011-03-03 15:37:59 -08:00
Tao Ma	425fa41072	ext3: Fix an overflow in ext3_trim_fs. In a bs=4096 volume, if we call FITRIM with the following parameter as fstrim_range(start = 102400, len = 134144000, minlen = 10240), with the following code: if (len >= EXT3_BLOCKS_PER_GROUP(sb)) len -= (EXT3_BLOCKS_PER_GROUP(sb) - first_block); else last_block = first_block + len; So if len < EXT3_BLOCKS_PER_GROUP while first_block + len > EXT3_BLOCKS_PER_GROUP, last_block will be set to an overflow value which exceeds EXT3_BLOCKS_PER_GROUP. This patch fixes it and adjusts len and last_block accordingly. Cc: Lukas Czerner <lczerner@redhat.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-03-04 00:34:15 +01:00
Eric Paris	ff36fe2c84	LSM: Pass -o remount options to the LSM The VFS mount code passes the mount options to the LSM. The LSM will remove options it understands from the data and the VFS will then pass the remaining options onto the underlying filesystem. This is how options like the SELinux context= work. The problem comes in that -o remount never calls into LSM code. So if you include an LSM specific option it will get passed to the filesystem and will cause the remount to fail. An example of where this is a problem is the 'seclabel' option. The SELinux LSM hook will print this word in /proc/mounts if the filesystem is being labeled using xattrs. If you pass this word on mount it will be silently stripped and ignored. But if you pass this word on remount the LSM never gets called and it will be passed to the FS. The FS doesn't know what seclabel means and thus should fail the mount. For example an ext3 fs mounted over loop # mount -o loop /tmp/fs /mnt/tmp # cat /proc/mounts \| grep /mnt/tmp /dev/loop0 /mnt/tmp ext3 rw,seclabel,relatime,errors=continue,barrier=0,data=ordered 0 0 # mount -o remount /mnt/tmp mount: /mnt/tmp not mounted already, or bad option # dmesg EXT3-fs (loop0): error: unrecognized mount option "seclabel" or missing value This patch passes the remount mount options to an new LSM hook. Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: James Morris <jmorris@namei.org>	2011-03-03 16:12:27 -05:00
Linus Torvalds	4c7fd114c6	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: zero proper structure size for geometry calls	2011-03-03 12:44:22 -08:00
Linus Torvalds	c640e13f8e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix regression that i-flag is not set on changeless checkpoints	2011-03-03 12:42:48 -08:00
Sage Weil	16a8b70a5a	ceph: do not clear I_COMPLETE from d_release First, this was racy anyway: d_release isn't called until well after the dentry is unhashed. Second, this runs afoul of the recent dcache change that clears d_parent prior to calling d_release (`949854d0`), causing a NULL pointer dereference. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-03 10:09:52 -08:00
Sage Weil	b545cc1505	ceph: do not set I_COMPLETE Do not set the I_COMPLETE flag on directories until we resolve races with dcache pruning. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-03 10:09:51 -08:00
Sage Weil	9bde178d05	Revert "ceph: keep reference to parent inode on ceph_dentry" This reverts commit `97d79b403e`. This fails to account for d_parent changes due to rename or disconnected dentries due to submounts or NFS reexports. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-03 10:09:50 -08:00
Al Viro	69102e9b4b	hfs: fix rename() over non-empty directory merge hfs_unlink() and hfs_rmdir(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-03 01:28:40 -05:00
Al Viro	810c1b2e48	udf: fix i_nlink limit (256 << sizeof(x)) - 1 is not the maximal possible value of x... In reality, the maximal allowed value for UDF FileLinkCount is 65535. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-03 01:28:40 -05:00
Al Viro	99890a3be1	fix reiserfs mkdir() breakage if directory has so many subdirectories that its link count is set to 1 (i.e. "can't tell accurately") and reiserfs_new_inode() fails, we shouldn't decrement the parent's link count in cleanup path; that's what DEC_DIR_INODE_NLINK() is for. As it is, we end up with parent suddenly getting zero i_nlink, with very unpleasant effects. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-03 01:28:40 -05:00
Al Viro	babfe56046	exofs: i_nlink races in rename() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-03 01:28:17 -05:00

... 2 3 4 5 6 ...

21860 Commits