OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Nick Piggin	99b7db7b8f	fs: brlock vfsmount_lock fs: brlock vfsmount_lock Use a brlock for the vfsmount lock. It must be taken for write whenever modifying the mount hash or associated fields, and may be taken for read when performing mount hash lookups. A new lock is added for the mnt-id allocator, so it doesn't need to take the heavy vfsmount write-lock. The number of atomics should remain the same for fastpath rlock cases, though code would be slightly slower due to per-cpu access. Scalability is not not be much improved in common cases yet, due to other locks (ie. dcache_lock) getting in the way. However path lookups crossing mountpoints should be one case where scalability is improved (currently requiring the global lock). The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node Altix system (high latency to remote nodes), a simple umount microbenchmark (mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it took 6.8s, afterwards took 7.1s, about 5% slower. Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-18 08:35:48 -04:00
Nick Piggin	b04f784e5d	fs: remove extra lookup in __lookup_hash fs: remove extra lookup in __lookup_hash Optimize lookup for create operations, where no dentry should often be common-case. In cases where it is not, such as unlink, the added overhead is much smaller than the removed. Also, move comments about __d_lookup racyness to the __d_lookup call site. d_lookup is intuitive; __d_lookup is what needs commenting. So in that same vein, add kerneldoc comments to __d_lookup and clean up some of the comments: - We are interested in how the RCU lookup works here, particularly with renames. Make that explicit, and point to the document where it is explained in more detail. - RCU is pretty standard now, and macros make implementations pretty mindless. If we want to know about RCU barrier details, we look in RCU code. - Delete some boring legacy comments because we don't care much about how the code used to work, more about the interesting parts of how it works now. So comments about lazy LRU may be interesting, but would better be done in the LRU or refcount management code. Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-18 08:35:47 -04:00
Nick Piggin	baa0389073	fs: dentry allocation consolidation fs: dentry allocation consolidation There are 2 duplicate copies of code in dentry allocation in path lookup. Consolidate them into a single function. Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-18 08:35:45 -04:00
Nick Piggin	2e2e88ea8c	fs: fix do_lookup false negative fs: fix do_lookup false negative In do_lookup, if we initially find no dentry, we take the directory i_mutex and re-check the lookup. If we find a dentry there, then we revalidate it if needed. However if that revalidate asks for the dentry to be invalidated, we return -ENOENT from do_lookup. What should happen instead is an attempt to allocate and lookup a new dentry. This is probably not noticed because it is rare. It is only reached if a concurrent create races in first (in which case, the dentry probably won't be invalidated anyway), or if the racy __d_lookup has failed due to a false-negative (which is very rare). Fix this by removing code and have it use the normal reval path. Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-18 08:35:45 -04:00
Miklos Szeredi	f7ad3c6be9	vfs: add helpers to get root and pwd Add three helpers that retrieve a refcounted copy of the root and cwd from the supplied fs_struct. get_fs_root() get_fs_pwd() get_fs_root_and_pwd() Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-11 00:28:20 -04:00
Linus Torvalds	8c8946f509	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify * 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits) fanotify: use both marks when possible fsnotify: pass both the vfsmount mark and inode mark fsnotify: walk the inode and vfsmount lists simultaneously fsnotify: rework ignored mark flushing fsnotify: remove global fsnotify groups lists fsnotify: remove group->mask fsnotify: remove the global masks fsnotify: cleanup should_send_event fanotify: use the mark in handler functions audit: use the mark in handler functions dnotify: use the mark in handler functions inotify: use the mark in handler functions fsnotify: send fsnotify_mark to groups in event handling functions fsnotify: Exchange list heads instead of moving elements fsnotify: srcu to protect read side of inode and vfsmount locks fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called fsnotify: use _rcu functions for mark list traversal fsnotify: place marks on object in order of group memory address vfs/fsnotify: fsnotify_close can delay the final work in fput fsnotify: store struct file not struct path ... Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.	2010-08-10 11:39:13 -07:00
Eric Paris	d09ca73979	security: make LSMs explicitly mask off permissions SELinux needs to pass the MAY_ACCESS flag so it can handle auditting correctly. Presently the masking of MAY_* flags is done in the VFS. In order to allow LSMs to decide what flags they care about and what flags they don't just pass them all and the each LSM mask off what they don't need. This patch should contain no functional changes to either the VFS or any LSM. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2010-08-02 15:35:07 +10:00
Tetsuo Handa	ea0d3ab239	LSM: Remove unused arguments from security_path_truncate(). When commit `be6d3e56a6` "introduce new LSM hooks where vfsmount is available." was proposed, regarding security_path_truncate(), only "struct file *" argument (which AppArmor wanted to use) was removed. But length and time_attrs arguments are not used by TOMOYO nor AppArmor. Thus, let's remove these arguments. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: James Morris <jmorris@namei.org>	2010-08-02 15:33:40 +10:00
Eric Paris	59b0df211b	fsnotify: use unsigned char * for dentry->d_name.name fsnotify was using char * when it passed around the d_name.name string internally but it is actually an unsigned char *. This patch switches fsnotify to use unsigned and should silence some pointer signess warnings which have popped out of xfs. I do not add -Wpointer-sign to the fsnotify code as there are still issues with kstrdup and strlen which would pop out needless warnings. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:59:01 -04:00
Neil Brown	176306f59a	VFS: fix recent breakage of FS_REVAL_DOT Commit `1f36f774b2` broke FS_REVAL_DOT semantics. In particular, before this patch, the command ls -l in an NFS mounted directory would always check if the directory on the server had changed and if so would flush and refill the pagecache for the dir. After this patch, the same "ls -l" will repeatedly return stale date until the cached attributes for the directory time out. The following patch fixes this by ensuring the d_revalidate is called by do_last when "." is being looked-up. link_path_walk has already called d_revalidate, but in that case LOOKUP_OPEN is not set so nfs_lookup_verify_inode chooses not to do any validation. The following patch restores the original behaviour. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:03:06 -04:00
Huang Shijie	9a2296832c	namei.c : update mnt when it needed update the mnt of the path when it is not equal to the new one. Signed-off-by: Huang Shijie <shijie8@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-21 18:31:22 -04:00
Al Viro	d83c49f3e3	Fix the regression created by "set S_DEAD on unlink()..." commit 1) i_flags simply doesn't work for mount/unlink race prevention; we may have many links to file and rm on one of those obviously shouldn't prevent bind on top of another later on. To fix it right way we need to mark _dentry_ as unsuitable for mounting upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and i_mutex on the inode in question. Set it (with dont_mount(dentry)) in unlink/rmdir/etc., check (with cant_mount(dentry)) in places in namespace.c that used to check for S_DEAD. Setting S_DEAD is still needed in places where we used to set it (for directories getting killed), since we rely on it for readdir/rmdir race prevention. 2) rename()/mount() protection has another bogosity - we unhash the target before we'd checked that it's not a mountpoint. Fixed. 3) ancient bogosity in pivot_root() - we locked i_mutex on the right directory, but checked S_DEAD on the different (and wrong) one. Noticed and fixed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-15 07:16:33 -04:00
Jan Kara	002baeecf5	vfs: Fix O_NOFOLLOW behavior for paths with trailing slashes According to specification mkdir d; ln -s d a; open("a/", O_NOFOLLOW \| O_RDONLY) should return success but currently it returns ELOOP. This is a regression caused by path lookup cleanup patch series. Fix the code to ignore O_NOFOLLOW in case the provided path has trailing slashes. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de> Acked-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-13 08:46:04 -07:00
Al Viro	3e297b6134	Restore LOOKUP_DIRECTORY hint handling in final lookup on open() Lose want_dir argument, while we are at it - since now nd->flags & LOOKUP_DIRECTORY is equivalent to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-26 12:41:05 -04:00
Jiri Kosina	318ae2edc3	Merge branch 'for-next' into for-linus Conflicts: Documentation/filesystems/proc.txt arch/arm/mach-u300/include/mach/debug-macro.S drivers/net/qlge/qlge_ethtool.c drivers/net/qlge/qlge_main.c drivers/net/typhoon.c	2010-03-08 16:55:37 +01:00
Al Viro	781b16775b	Fix a dumb typo - use of & instead of && We managed to lose O_DIRECTORY testing due to a stupid typo in commit `1f36f774b2` ("Switch !O_CREAT case to use of do_last()") Reported-by: Walter Sheets <w41ter@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-06 10:54:48 -08:00
Linus Torvalds	e213e26ab3	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits) quota: stop using QUOTA_OK / NO_QUOTA dquot: cleanup dquot initialize routine dquot: move dquot initialization responsibility into the filesystem dquot: cleanup dquot drop routine dquot: move dquot drop responsibility into the filesystem dquot: cleanup dquot transfer routine dquot: move dquot transfer responsibility into the filesystem dquot: cleanup inode allocation / freeing routines dquot: cleanup space allocation / freeing routines ext3: add writepage sanity checks ext3: Truncate allocated blocks if direct IO write fails to update i_size quota: Properly invalidate caches even for filesystems with blocksize < pagesize quota: generalize quota transfer interface quota: sb_quota state flags cleanup jbd: Delay discarding buffers in journal_unmap_buffer ext3: quota_write cross block boundary behaviour quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota quota: split out compat_sys_quotactl support from quota.c quota: split out netlink notification support from quota.c quota: remove invalid optimization from quota_sync_all ... Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c	2010-03-05 13:20:53 -08:00
Al Viro	1f36f774b2	Switch !O_CREAT case to use of do_last() ... and now we have all intents crap well localized Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:22:25 -05:00
Al Viro	def4af30cf	Get rid of symlink body copying Now that nd->last stays around until ->put_link() is called, we can just postpone that ->put_link() in do_filp_open() a bit and don't bother with copying. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:40 -05:00
Al Viro	3866248e5f	Finish pulling of -ESTALE handling to upper level in do_filp_open() Don't bother with path_walk() (and its retry loop); link_path_walk() will do it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:38 -05:00
Al Viro	806b681cbe	Turn do_link spaghetty into a normal loop Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:36 -05:00
Al Viro	10fa8e62f2	Unify exits in O_CREAT handling Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:35 -05:00
Al Viro	9e67f36169	Kill is_link argument of do_last() We set it to 1 iff we return NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:33 -05:00
Al Viro	67ee3ad21d	Pull handling of LAST_BIND into do_last(), clean up ok: part in do_filp_open() Note that in case of !O_CREAT we know that nd.root has already been given up Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:31 -05:00
Al Viro	4296e2cbf2	Leave mangled flag only for setting nd.intent.open.flag Nothing else uses it anymore Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:29 -05:00
Al Viro	5b369df826	Get rid of passing mangled flag to do_last() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:27 -05:00
Al Viro	9a66179e13	Don't pass mangled open_flag to finish_open() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:25 -05:00
Al Viro	a2c36b450e	pull more into do_last() Handling of LAST_DOT/LAST_ROOT/LAST_DOTDOT/terminating slash can be pulled in as well Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:24 -05:00
Al Viro	c99658fe97	bail out with ELOOP earlier in do_link loop If we'd passed through 32 trailing symlinks already, there's no sense following the 33rd - we'll bail out anyway. Better bugger off earlier. It does change behaviour, after a fashion - if the 33rd happens to be a procfs-style symlink, original code would allow it. This one will not. Cry me a river if that hurts you. Please, do. And post a video of that, while you are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:22 -05:00
Al Viro	a1e28038df	pull the common predecessors into do_last() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:20 -05:00
Al Viro	c41c140562	postpone __putname() until after do_last() Since do_last() doesn't mangle nd->last_name, we can safely postpone __putname() done in handling of trailing symlinks until after the call of do_last() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:18 -05:00
Al Viro	27bff34300	unroll do_last: loop in do_filp_open() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:16 -05:00
Al Viro	3343eb8209	Shift releasing nd->root from do_last() to its caller Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:15 -05:00
Al Viro	fb1cc555d5	gut do_filp_open() a bit more (do_last separation) Brute-force separation of stuff reachable from do_last: with the exception of do_link:; just take all that crap to a helper function as-is and have it tell the caller if it has to go to do_link. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:13 -05:00
Al Viro	648fa8611d	beginning to untangle do_filp_open() That's going to be a long and painful series. The first step: take the stuff reachable from 'ok' label in do_filp_open() into a new helper (finish_open()). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-05 09:01:11 -05:00
Christoph Hellwig	907f4554e2	dquot: move dquot initialization responsibility into the filesystem Currently various places in the VFS call vfs_dq_init directly. This means we tie the quota code into the VFS. Get rid of that and make the filesystem responsible for the initialization. For most metadata operations this is a straight forward move into the methods, but for truncate and open it's a bit more complicated. For truncate we currently only call vfs_dq_init for the sys_truncate case because open already takes care of it for ftruncate and open(O_TRUNC) - the new code causes an additional vfs_dq_init for those which is harmless. For open the initialization is moved from do_filp_open into the open method, which means it happens slightly earlier now, and only for regular files. The latter is fine because we don't need to initialize it for operations on special files, and we already do it as part of the namespace operations for directories. Add a dquot_file_open helper that filesystems that support generic quotas can use to fill in ->open. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:30 +01:00
Al Viro	9643f5d94a	Merge branch 'for-fsnotify' into for-linus	2010-03-03 17:12:40 -05:00
Al Viro	bec1052e5b	set S_DEAD on unlink() and non-directory rename() victims Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 14:12:08 -05:00
Al Viro	3088dd7080	Clean follow_dotdot() up a bit No need to open-code follow_up() in it and locking can be lighter. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 14:07:56 -05:00
Al Viro	8737c9305b	Switch may_open() and break_lease() to passing O_... ... instead of mixing FMODE_ and O_ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 13:00:21 -05:00
Al Viro	ac278a9c50	fix LOOKUP_FOLLOW on automount "symlinks" Make sure that automount "symlinks" are followed regardless of LOOKUP_FOLLOW; it should have no effect on them. Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-02-19 03:56:42 -05:00
Al Viro	cccc6bba3f	Lose the first argument of audit_inode_child() it's always equal to ->d_name.name of the second argument Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-02-08 14:38:36 -05:00
Al Viro	123df2944c	Lose the new_name argument of fsnotify_move() it's always new_dentry->d_name.name Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-02-08 14:38:36 -05:00
Mimi Zohar	9bbb6cad01	ima: rename ima_path_check to ima_file_check ima_path_check actually deals with files! call it ima_file_check instead. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-02-07 03:06:22 -05:00
Mimi Zohar	8eb988c70e	fix ima breakage The "Untangling ima mess, part 2 with counters" patch messed up the counters. Based on conversations with Al Viro, this patch streamlines ima_path_check() by removing the counter maintaince. The counters are now updated independently, from measuring the file, in __dentry_open() and alloc_file() by calling ima_counts_get(). ima_path_check() is called from nfsd and do_filp_open(). It also did not measure all files that should have been measured. Reason: ima_path_check() got bogus value passed as mask. [AV: mea culpa] [AV: add missing nfsd bits] Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-02-07 03:06:22 -05:00
Adam Buchbinder	c41b20e721	Fix misspellings of "truly" in comments. Some comments misspell "truly"; this fixes them. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-02-04 11:55:45 +01:00
Al Viro	9850c05655	Fix the -ESTALE handling in do_filp_open() Instead of playing sick games with path saving, cleanups, just retry the entire thing once with LOOKUP_REVAL added. Post-.34 we'll convert all -ESTALE handling in there to that style, rather than playing with many retry loops deep in the call chain. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-14 09:05:26 -05:00
Al Viro	6d125529c6	Fix ACC_MODE() for real commit `5300990c03` had stepped on a rather nasty mess: definitions of ACC_MODE used to be different. Fixed the resulting breakage, converting them to variant that takes O_... value; all callers have that and it actually simplifies life (see tomoyo part of changes). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-14 09:05:26 -05:00
Al Viro	86acdca1b6	fix autofs/afs/etc. magic mountpoint breakage We end up trying to kfree() nd.last.name on open("/mnt/tmp", O_CREAT) if /mnt/tmp is an autofs direct mount. The reason is that nd.last_type is bogus here; we want LAST_BIND for everything of that kind and we get LAST_NORM left over from finding parent directory. So make sure that it is set properly; set to LAST_BIND before doing ->follow_link() - for normal symlinks it will be changed by __vfs_follow_link() and everything else needs it set that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-14 09:05:25 -05:00
Serge E. Hallyn	7ea6600148	generic_permission: MAY_OPEN is not write access generic_permission was refusing CAP_DAC_READ_SEARCH-enabled processes from opening DAC-protected files read-only, because do_filp_open adds MAY_OPEN to the open mask. Ignore MAY_OPEN. After this patch, CAP_DAC_READ_SEARCH is again sufficient to open(fname, O_RDONLY) on a file to which DAC otherwise refuses us read permission. Reported-by: Mike Kazantsev <mk.fraggod@gmail.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Mike Kazantsev <mk.fraggod@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-30 12:35:44 -08:00

1 2 3 4 5 ...

275 Commits