OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Al Viro	6ac087099e	path_openat(): take O_PATH handling out of do_last() do_last() and lookup_open() simpler that way and so does O_PATH itself. As it bloody well should: we find what the pathname resolves to, same way as in stat() et.al. and associate it with FMODE_PATH struct file. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-02 19:49:33 -04:00
Al Viro	9902af79c0	parallel lookups: actual switch to rwsem ta-da! The main issue is the lack of down_write_killable(), so the places like readdir.c switched to plain inode_lock(); once killable variants of rwsem primitives appear, that'll be dealt with. lockdep side also might need more work Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-02 19:49:28 -04:00
Al Viro	d9171b9345	parallel lookups machinery, part 4 (and last) If we do run into an in-lookup match, we need to wait for it to cease being in-lookup. Fortunately, we do have unused space in in-lookup dentries - d_lru is never looked at until it stops being in-lookup. So we can stash a pointer to wait_queue_head from stack frame of the caller of ->lookup(). Some precautions are needed while waiting, but it's not that hard - we do hold a reference to dentry we are waiting for, so it can't go away. If it's found to be in-lookup the wait_queue_head is still alive and will remain so at least while ->d_lock is held. Moreover, the condition we are waiting for becomes true at the same point where everything on that wq gets woken up, so we can just add ourselves to the queue once. d_alloc_parallel() gets a pointer to wait_queue_head_t from its caller; lookup_slow() adjusted, d_add_ci() taught to use d_alloc_parallel() if the dentry passed to it happens to be in-lookup one (i.e. if it's been called from the parallel lookup). That's pretty much it - all that remains is to switch ->i_mutex to rwsem and have lookup_slow() take it shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-02 19:49:27 -04:00
Al Viro	94bdd655ca	parallel lookups machinery, part 3 We will need to be able to check if there is an in-lookup dentry with matching parent/name. Right now it's impossible, but as soon as start locking directories shared such beasts will appear. Add a secondary hash for locating those. Hash chains go through the same space where d_alias will be once it's not in-lookup anymore. Search is done under the same bitlock we use for modifications - with the primary hash we can rely on d_rehash() into the wrong chain being the worst that could happen, but here the pointers are buggered once it's removed from the chain. On the other hand, the chains are not going to be long and normally we'll end up adding to the chain anyway. That allows us to avoid bothering with ->d_lock when doing the comparisons - everything is stable until removed from chain. New helper: d_alloc_parallel(). Right now it allocates, verifies that no hashed and in-lookup matches exist and adds to in-lookup hash. Returns ERR_PTR() for error, hashed match (in the unlikely case it's been found) or new dentry. In-lookup matches trigger BUG() for now; that will change in the next commit when we introduce waiting for ongoing lookup to finish. Note that in-lookup matches won't be possible until we actually go for shared locking. lookup_slow() switched to use of d_alloc_parallel(). Again, these commits are separated only for making it easier to review. All this machinery will start doing something useful only when we go for shared locking; it's just that the combination is too large for my taste. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-02 19:49:27 -04:00
Al Viro	85c7f81041	beginning of transition to parallel lookups - marking in-lookup dentries marked as such when (would be) parallel lookup is about to pass them to actual ->lookup(); unmarked when * __d_add() is about to make it hashed, positive or not. * __d_move() (from d_splice_alias(), directly or via __d_unalias()) puts a preexisting dentry in its place * in caller of ->lookup() if it has escaped all of the above. Bug (WARN_ON, actually) if it reaches the final dput() or d_instantiate() while still marked such. As the result, we are guaranteed that for as long as the flag is set, dentry will * remain negative unhashed with positive refcount * never have its ->d_alias looked at * never have its ->d_lru looked at * never have its ->d_parent and ->d_name changed Right now we have at most one such for any given parent directory. With parallel lookups that restriction will weaken to * only exist when parent is locked shared * at most one with given (parent,name) pair (comparison of names is according to ->d_compare()) * only exist when there's no hashed dentry with the same (parent,name) Transition will take the next several commits; unfortunately, we'll only be able to switch to rwsem at the end of this series. The reason for not making it a single patch is to simplify review. New primitives: d_in_lookup() (a predicate checking if dentry is in the in-lookup state) and d_lookup_done() (tells the system that we are done with lookup and if it's still marked as in-lookup, it should cease to be such). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-02 19:47:51 -04:00
Al Viro	1936386ea9	lookup_slow(): bugger off on IS_DEADDIR() from the very beginning Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-02 19:47:26 -04:00
Al Viro	84695ffee7	Merge getxattr prototype change into work.lookups The rest of work.xattr stuff isn't needed for this branch	2016-05-02 19:45:47 -04:00
Mimi Zohar	05d1a717ec	ima: add support for creating files using the mknodat syscall Commit `3034a14` "ima: pass 'opened' flag to identify newly created files" stopped identifying empty files as new files. However new empty files can be created using the mknodat syscall. On systems with IMA-appraisal enabled, these empty files are not labeled with security.ima extended attributes properly, preventing them from subsequently being opened in order to write the file data contents. This patch defines a new hook named ima_post_path_mknod() to mark these empty files, created using mknodat, as new in order to allow the file data contents to be written. In addition, files with security.ima xattrs containing a file signature are considered "immutable" and can not be modified. The file contents need to be written, before signing the file. This patch relaxes this requirement for new files, allowing the file signature to be written before the file contents. Changelog: - defer identifying files with signatures stored as security.ima (based on Dmitry Rozhkov's comments) - removing tests (eg. dentry, dentry->d_inode, inode->i_size == 0) (based on Al's review) Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Al Viro <<viro@zeniv.linux.org.uk> Tested-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>	2016-05-01 09:23:52 -04:00
Al Viro	10c64cea04	atomic_open(): fix the handling of create_error * if we have a hashed negative dentry and either CREAT\|EXCL on r/o filesystem, or CREAT\|TRUNC on r/o filesystem, or CREAT\|EXCL with failing may_o_create(), we should fail with EROFS or the error may_o_create() has returned, but not ENOENT. Which is what the current code ends up returning. * if we have CREAT\|TRUNC hitting a regular file on a read-only filesystem, we can't fail with EROFS here. At the very least, not until we'd done follow_managed() - we might have a writable file (or a device, for that matter) bound on top of that one. Moreover, the code downstream will see that O_TRUNC and attempt to grab the write access (after following possible mount), so if we really should fail with EROFS, it will happen. No need to do that inside atomic_open(). The real logics is much simpler than what the current code is trying to do - if we decided to go for simple lookup, ended up with a negative dentry and had create_error set, fail with create_error. No matter whether we'd got that negative dentry from lookup_real() or had found it in dcache. Cc: stable@vger.kernel.org # v3.6+ Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-04-30 16:40:52 -04:00
Al Viro	fc64005c93	don't bother with ->d_inode->i_sb - it's always equal to ->d_sb ... and neither can ever be NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-04-10 17:11:51 -04:00
Christoph Hellwig	bfe8804d90	xfs: use ->readlink to implement the readlink_by_handle ioctl Also drop the now unused readlink_copy export. [dchinner: use d_inode(dentry) rather than dentry->d_inode] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-04-06 07:50:54 +10:00
Andreas Gruenbacher	b8a7a3a667	posix_acl: Inode acl caching fixes When get_acl() is called for an inode whose ACL is not cached yet, the get_acl inode operation is called to fetch the ACL from the filesystem. The inode operation is responsible for updating the cached acl with set_cached_acl(). This is done without locking at the VFS level, so another task can call set_cached_acl() or forget_cached_acl() before the get_acl inode operation gets to calling set_cached_acl(), and then get_acl's call to set_cached_acl() results in caching an outdate ACL. Prevent this from happening by setting the cached ACL pointer to a task-specific sentinel value before calling the get_acl inode operation. Move the responsibility for updating the cached ACL from the get_acl inode operations to get_acl(). There, only set the cached ACL if the sentinel value hasn't changed. The sentinel values are chosen to have odd values. Likewise, the value of ACL_NOT_CACHED is odd. In contrast, ACL object pointers always have an even value (ACLs are aligned in memory). This allows to distinguish uncached ACLs values from ACL objects. In addition, switch from guarding inode->i_acl and inode->i_default_acl upates by the inode->i_lock spinlock to using xchg() and cmpxchg(). Filesystems that do not want ACLs returned from their get_acl inode operations to be cached must call forget_cached_acl() to prevent the VFS from doing so. (Patch written by Al Viro and Andreas Gruenbacher.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-31 00:30:15 -04:00
Al Viro	7500c38ac3	fix the braino in "namei: massage lookup_slow() to be usable by lookup_one_len_unlocked()" We should try to trigger automount before bailing out on negative dentry. Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Reported-by: Arend van Spriel <arend@broadcom.com> Tested-by: Arend van Spriel <arend@broadcom.com> Tested-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-31 00:23:05 -04:00
Al Viro	d360775217	constify security_path_{mkdir,mknod,symlink} ... as well as unix_mknod() and may_o_create() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-28 00:47:27 -04:00
Al Viro	9d95afd597	kill dentry_unhash() the last user is gone Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:16:33 -04:00
Al Viro	949a852e46	namei: teach lookup_slow() to skip revalidate ... and make mountpoint_last() use it. That makes all candidates for lookup with parent locked shared go through lookup_slow(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:15:46 -04:00
Al Viro	e3c1392808	namei: massage lookup_slow() to be usable by lookup_one_len_unlocked() Return dentry and don't pass nameidata or path; lift crossing mountpoints into the caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:15:40 -04:00
Al Viro	d6d95ded91	lookup_one_len_unlocked(): use lookup_dcache() No need to lock parent just because of ->d_revalidate() on child; contrary to the stale comment, lookup_dcache() can be used without locking the parent. Result can be moved as soon as we return, of course, but the same is true for lookup_one_len_unlocked() itself. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:15:36 -04:00
Al Viro	74ff0ffc7f	namei: simplify invalidation logics in lookup_dcache() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:15:31 -04:00
Al Viro	e9742b5332	namei: change calling conventions for lookup_{fast,slow} and follow_managed() Have lookup_fast() return 1 on success and 0 on "need to fall back"; lookup_slow() and follow_managed() return positive (1) on success. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:14:35 -04:00
Al Viro	5d0f49c136	namei: untanlge lookup_fast() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:14:25 -04:00
Al Viro	6c51e513a3	lookup_dcache(): lift d_alloc() into callers ... and kill need_lookup thing Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-05 20:09:32 -05:00
Al Viro	6583fe22d1	do_last(): reorder and simplify a bit bugger off on negatives a bit earlier, simplify the tests Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-05 18:14:03 -05:00
Al Viro	5129fa482b	do_last(): ELOOP failure exit should be done after leaving RCU mode ... or we risk seeing a bogus value of d_is_symlink() there. Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-02-27 19:37:37 -05:00
Al Viro	a7f775428b	should_follow_link(): validate ->d_seq after having decided to follow ... otherwise d_is_symlink() above might have nothing to do with the inode value we've got. Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-02-27 19:31:01 -05:00
Al Viro	d4565649b6	namei: ->d_inode of a pinned dentry is stable only for positives both do_last() and walk_component() risk picking a NULL inode out of dentry about to become positive, then checking its flags and seeing that it's not negative anymore and using (already stale by then) value they'd fetched earlier. Usually ends up oopsing soon after that... Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-02-27 19:23:16 -05:00
Al Viro	c80567c82a	do_last(): don't let a bogus return value from ->open() et.al. to confuse us ... into returning a positive to path_openat(), which would interpret that as "symlink had been encountered" and proceed to corrupt memory, etc. It can only happen due to a bug in some ->open() instance or in some LSM hook, etc., so we report any such event and make sure it doesn't trick us into further unpleasantness. Cc: stable@vger.kernel.org # v3.6+, at least Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-02-27 19:17:33 -05:00
Al Viro	5955102c99	wrappers for ->i_mutex access parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-01-22 18:04:28 -05:00
Linus Torvalds	33caf82acf	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "All kinds of stuff. That probably should've been 5 or 6 separate branches, but by the time I'd realized how large and mixed that bag had become it had been too close to -final to play with rebasing. Some fs/namei.c cleanups there, memdup_user_nul() introduction and switching open-coded instances, burying long-dead code, whack-a-mole of various kinds, several new helpers for ->llseek(), assorted cleanups and fixes from various people, etc. One piece probably deserves special mention - Neil's lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets called without ->i_mutex and tries to avoid ever taking it. That, of course, means that it's not useful for any directory modifications, but things like getting inode attributes in nfds readdirplus are fine with that. I really should've asked for moratorium on lookup-related changes this cycle, but since I hadn't done that early enough... I am asking for that for the coming cycle, though - I'm going to try and get conversion of i_mutex to rwsem with ->lookup() done under lock taken shared. There will be a patch closer to the end of the window, along the lines of the one Linus had posted last May - mechanical conversion of ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/ inode_is_locked()/inode_lock_nested(). To quote Linus back then: ----- \| This is an automated patch using \| \| sed 's/mutex_lock(&\(.\)->i_mutex)/inode_lock(\1)/' \| sed 's/mutex_unlock(&\(.\)->i_mutex)/inode_unlock(\1)/' \| sed 's/mutex_lock_nested(&\(.\)->i_mutex,[ ]I_MUTEX_\([A-Z0-9_]\))/inode_lock_nested(\1, I_MUTEX_\2)/' \| sed 's/mutex_is_locked(&\(.\)->i_mutex)/inode_is_locked(\1)/' \| sed 's/mutex_trylock(&\(.\)->i_mutex)/inode_trylock(\1)/' \| \| with a very few manual fixups ----- I'm going to send that once the ->i_mutex-affecting stuff in -next gets mostly merged (or when Linus says he's about to stop taking merges)" 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) nfsd: don't hold i_mutex over userspace upcalls fs:affs:Replace time_t with time64_t fs/9p: use fscache mutex rather than spinlock proc: add a reschedule point in proc_readfd_common() logfs: constify logfs_block_ops structures fcntl: allow to set O_DIRECT flag on pipe fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE fs: xattr: Use kvfree() [s390] page_to_phys() always returns a multiple of PAGE_SIZE nbd: use ->compat_ioctl() fs: use block_device name vsprintf helper lib/vsprintf: add %*pg format specifier fs: use gendisk->disk_name where possible poll: plug an unused argument to do_poll amdkfd: don't open-code memdup_user() cdrom: don't open-code memdup_user() rsxx: don't open-code memdup_user() mtip32xx: don't open-code memdup_user() [um] mconsole: don't open-code memdup_user_nul() [um] hostaudio: don't open-code memdup_user() ...	2016-01-12 17:11:47 -08:00
NeilBrown	bbddca8e8f	nfsd: don't hold i_mutex over userspace upcalls We need information about exports when crossing mountpoints during lookup or NFSv4 readdir. If we don't already have that information cached, we may have to ask (and wait for) rpc.mountd. In both cases we currently hold the i_mutex on the parent of the directory we're asking rpc.mountd about. We've seen situations where rpc.mountd performs some operation on that directory that tries to take the i_mutex again, resulting in deadlock. With some care, we may be able to avoid that in rpc.mountd. But it seems better just to avoid holding a mutex while waiting on userspace. It appears that lookup_one_len is pretty much the only operation that needs the i_mutex. So we could just drop the i_mutex elsewhere and do something like mutex_lock() lookup_one_len() mutex_unlock() In many cases though the lookup would have been cached and not required the i_mutex, so it's more efficient to create a lookup_one_len() variant that only takes the i_mutex when necessary. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-01-09 03:07:52 -05:00
Al Viro	62fb4a155f	don't carry MAY_OPEN in op->acc_mode Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-01-04 10:28:40 -05:00
Al Viro	fceef393a5	switch ->get_link() to delayed_call, kill ->put_link() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-30 13:01:03 -05:00
Al Viro	d3883d4f93	teach page_get_link() to work in RCU mode more or less along the lines of Neil's patchset, sans the insanity around kmap(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-08 22:41:54 -05:00
Al Viro	6b2553918d	replace ->follow_link() with new method that could stay in RCU mode new method: ->get_link(); replacement of ->follow_link(). The differences are: * inode and dentry are passed separately * might be called both in RCU and non-RCU mode; the former is indicated by passing it a NULL dentry. * when called that way it isn't allowed to block and should return ERR_PTR(-ECHILD) if it needs to be called in non-RCU mode. It's a flagday change - the old method is gone, all in-tree instances converted. Conversion isn't hard; said that, so far very few instances do not immediately bail out when called in RCU mode. That'll change in the next commits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-08 22:41:54 -05:00
Al Viro	21fc61c73c	don't put symlink bodies in pagecache into highmem kmap() in page_follow_link_light() needed to go - allowing to hold an arbitrary number of kmaps for long is a great way to deadlocking the system. new helper (inode_nohighmem(inode)) needs to be used for pagecache symlinks inodes; done for all in-tree cases. page_follow_link_light() instrumented to yell about anything missed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-08 22:41:36 -05:00
Al Viro	e1a63bbc40	restore_nameidata(): no need to clear now->stack microoptimization: in all callers *now is in the frame we are about to leave. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-06 21:18:27 -05:00
Al Viro	248fb5b955	namei.c: take "jump to root" into a new helper ... and use it both in path_init() (for absolute pathnames) and get_link() (for absolute symlinks). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-06 21:18:21 -05:00
Al Viro	ef55d91700	path_init(): set nd->inode earlier in cwd-relative case that allows to kill the recheck of nd->seq on the way out in this case, and this check on the way out is left only for absolute pathnames. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-06 21:18:16 -05:00
Al Viro	9e6697e26f	namei.c: fold set_root_rcu() into set_root() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-06 21:18:10 -05:00
Mike Marshall	57e3715cfa	typo in fs/namei.c comment Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-06 21:17:18 -05:00
Al Viro	aa80deab33	namei: page_getlink() and page_follow_link_light() are the same thing Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-06 20:43:27 -05:00
Al Viro	2788cc47f4	Don't reset ->total_link_count on nested calls of vfs_path_lookup() we already zero it on outermost set_nameidata(), so initialization in path_init() is pointless and wrong. The same DoS exists on pre-4.2 kernels, but there a slightly different fix will be needed. Cc: stable@vger.kernel.org # v4.2 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-06 12:33:02 -05:00
Linus Torvalds	ad804a0b2a	Merge branch 'akpm' (patches from Andrew) Merge second patch-bomb from Andrew Morton: - most of the rest of MM - procfs - lib/ updates - printk updates - bitops infrastructure tweaks - checkpatch updates - nilfs2 update - signals - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc, dma-debug, dma-mapping, ... * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) ipc,msg: drop dst nil validation in copy_msg include/linux/zutil.h: fix usage example of zlib_adler32() panic: release stale console lock to always get the logbuf printed out dma-debug: check nents in dma_sync_sg* dma-mapping: tidy up dma_parms default handling pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode kexec: use file name as the output message prefix fs, seqfile: always allow oom killer seq_file: reuse string_escape_str() fs/seq_file: use seq_* helpers in seq_hex_dump() coredump: change zap_threads() and zap_process() to use for_each_thread() coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() signal: turn dequeue_signal_lock() into kernel_dequeue_signal() signals: kill block_all_signals() and unblock_all_signals() nilfs2: fix gcc uninitialized-variable warnings in powerpc build nilfs2: fix gcc unused-but-set-variable warnings MAINTAINERS: nilfs2: add header file for tracing nilfs2: add tracepoints for analyzing reading and writing metadata files ...	2015-11-07 14:32:45 -08:00
Linus Torvalds	75021d2859	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina: "Trivial stuff from trivial tree that can be trivially summed up as: - treewide drop of spurious unlikely() before IS_ERR() from Viresh Kumar - cosmetic fixes (that don't really affect basic functionality of the driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek - various comment / printk fixes and updates all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: bcache: Really show state of work pending bit hwmon: applesmc: fix comment typos Kconfig: remove comment about scsi_wait_scan module class_find_device: fix reference to argument "match" debugfs: document that debugfs_remove*() accepts NULL and error values net: Drop unlikely before IS_ERR(_OR_NULL) mm: Drop unlikely before IS_ERR(_OR_NULL) fs: Drop unlikely before IS_ERR(_OR_NULL) drivers: net: Drop unlikely before IS_ERR(_OR_NULL) drivers: misc: Drop unlikely before IS_ERR(_OR_NULL) UBI: Update comments to reflect UBI_METAONLY flag pktcdvd: drop null test before destroy functions	2015-11-07 13:05:44 -08:00
Michal Hocko	c62d25556b	mm, fs: introduce mapping_gfp_constraint() There are many places which use mapping_gfp_mask to restrict a more generic gfp mask which would be used for allocations which are not directly related to the page cache but they are performed in the same context. Let's introduce a helper function which makes the restriction explicit and easier to track. This patch doesn't introduce any functional changes. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Linus Torvalds	6de29ccb50	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns hardlink capability check fix from Eric Biederman: "This round just contains a single patch. There has been a lot of other work this period but it is not quite ready yet, so I am pushing it until 4.5. The remaining change by Dirk Steinmetz wich fixes both Gentoo and Ubuntu containers allows hardlinks if we have the appropriate capabilities in the user namespace. Security wise it is really a gimme as the user namespace root can already call setuid become that user and create the hardlink" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: namei: permit linking with CAP_FOWNER in userns	2015-11-05 15:20:56 -08:00
Dirk Steinmetz	f2ca379642	namei: permit linking with CAP_FOWNER in userns Attempting to hardlink to an unsafe file (e.g. a setuid binary) from within an unprivileged user namespace fails, even if CAP_FOWNER is held within the namespace. This may cause various failures, such as a gentoo installation within a lxc container failing to build and install specific packages. This change permits hardlinking of files owned by mapped uids, if CAP_FOWNER is held for that namespace. Furthermore, it improves consistency by using the existing inode_owner_or_capable(), which is aware of namespaced capabilities as of `23adbe12ef` ("fs,userns: Change inode_capable to capable_wrt_inode_uidgid"). Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com> This is hitting us in Ubuntu during some dpkg upgrades in containers. When upgrading a file dpkg creates a hard link to the old file to back it up before overwriting it. When packages upgrade suid files owned by a non-root user the link isn't permitted, and the package upgrade fails. This patch fixes our problem. Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2015-10-27 16:12:35 -05:00
Trond Myklebust	daf3761c9f	namei: results of d_is_negative() should be checked after dentry revalidation Leandro Awa writes: "After switching to version 4.1.6, our parallelized and distributed workflows now fail consistently with errors of the form: T34: ./regex.c:39:22: error: config.h: No such file or directory From our 'git bisect' testing, the following commit appears to be the possible cause of the behavior we've been seeing: commit 766c4cbfacd8" Al Viro says: "What happens is that `766c4cbfac` got the things subtly wrong. We used to treat d_is_negative() after lookup_fast() as "fall with ENOENT". That was wrong - checking ->d_flags outside of ->d_seq protection is unreliable and failing with hard error on what should've fallen back to non-RCU pathname resolution is a bug. Unfortunately, we'd pulled the test too far up and ran afoul of another kind of staleness. The dentry might have been absolutely stable from the RCU point of view (and we might be on UP, etc), but stale from the remote fs point of view. If ->d_revalidate() returns "it's actually stale", dentry gets thrown away and the original code wouldn't even have looked at its ->d_flags. What we need is to check ->d_flags where `766c4cbfac` does (prior to ->d_seq validation) but only use the result in cases where we do not discard this dentry outright" Reported-by: Leandro Awa <lawa@nvidia.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=104911 Fixes: `766c4cbfac` ("namei: d_is_negative() should be checked...") Tested-by: Leandro Awa <lawa@nvidia.com> Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-10-10 10:17:27 -07:00
Viresh Kumar	a1c83681d5	fs: Drop unlikely before IS_ERR(_OR_NULL) IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there is no need to do that again from its callers. Drop it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve French <smfrench@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2015-09-29 15:13:58 +02:00
Masanari Iida	2a78b857d3	namei: fix warning while make xmldocs caused by namei.c Fix the following warnings: Warning(.//fs/namei.c:2422): No description found for parameter 'nd' Warning(.//fs/namei.c:2422): Excess function parameter 'nameidata' description in 'path_mountpoint' Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-10 13:29:01 -07:00
Eric W. Biederman	397d425dc2	vfs: Test for and handle paths that are unreachable from their mnt_root In rare cases a directory can be renamed out from under a bind mount. In those cases without special handling it becomes possible to walk up the directory tree to the root dentry of the filesystem and down from the root dentry to every other file or directory on the filesystem. Like division by zero .. from an unconnected path can not be given a useful semantic as there is no predicting at which path component the code will realize it is unconnected. We certainly can not match the current behavior as the current behavior is a security hole. Therefore when encounting .. when following an unconnected path return -ENOENT. - Add a function path_connected to verify path->dentry is reachable from path->mnt.mnt_root. AKA to validate that rename did not do something nasty to the bind mount. To avoid races path_connected must be called after following a path component to it's next path component. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-08-21 03:20:10 -04:00
Al Viro	aa65fa35ba	may_follow_link() should use nd->inode Now that we can get there in RCU mode, we shouldn't play with nd->path.dentry->d_inode - it's not guaranteed to be stable. Use nd->inode instead. Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-08-04 23:23:50 -04:00
Al Viro	97242f99a0	link_path_walk(): be careful when failing with ENOTDIR In RCU mode we might end up with dentry evicted just we check that it's a directory. In such case we should return ECHILD rather than ENOTDIR, so that pathwalk would be retries in non-RCU mode. Breakage had been introduced in commit `b18825a` - prior to that we were looking at nd->inode, which had been fetched before verifying that ->d_seq was still valid. That form of check would only be satisfied if at some point the pathname prefix would indeed have resolved to a non-directory. The fix consists of checking ->d_seq after we'd run into a non-directory dentry, and failing with ECHILD in case of mismatch. Note that all branches since 3.12 have that problem... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-08-01 20:18:38 -04:00
Al Viro	06d7137e5c	namei: make set_root_rcu() return void The only caller that cares about its return value can just as easily pick it from nd->root_seq itself. We used to just calculate it and return to caller, but these days we are storing it in nd->root_seq in all cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-06-29 12:07:04 -04:00
Al Viro	b853a16176	turn user_{path_at,path,lpath,path_dir}() into static inlines Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:45 -04:00
Al Viro	9883d1855e	namei: move saved_nd pointer into struct nameidata these guys are always declared next to each other; might as well put the former (pointer to previous instance) into the latter and simplify the calling conventions for {set,restore}_nameidata() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:45 -04:00
Al Viro	520ae68747	inline user_path_create() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:44 -04:00
Al Viro	a2ec4a2d5c	inline user_path_parent() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:44 -04:00
Al Viro	76ae2a5ab1	namei: trim do_last() arguments now that struct filename is stashed in nameidata we have no need to pass it in Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:43 -04:00
Al Viro	c8a53ee5ee	namei: stash dfd and name into nameidata fewer arguments to pass around... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:43 -04:00
Al Viro	102b8af266	namei: fold path_cleanup() into terminate_walk() they are always called next to each other; moreover, terminate_walk() is more symmetrical that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:42 -04:00
Al Viro	5c31b6cedb	namei: saner calling conventions for filename_parentat() a) make it reject ERR_PTR() for name b) make it putname(name) on all other failure exits c) make it return name on success again, simplifies the callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:42 -04:00
Al Viro	181c37b6e4	namei: saner calling conventions for filename_create() a) make it reject ERR_PTR() for name b) make it putname(name) upon return in all other cases. seriously simplifies the callers... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:42 -04:00
Al Viro	391172c46e	namei: shift nameidata down into filename_parentat() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:41 -04:00
Al Viro	abc9f5beb1	namei: make filename_lookup() reject ERR_PTR() passed as name makes for much easier life in callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:41 -04:00
Al Viro	9ad1aaa615	namei: shift nameidata inside filename_lookup() pass root instead; non-NULL => copy to nd.root and set LOOKUP_ROOT in flags Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:40 -04:00
Al Viro	e4bd1c1a95	namei: move putname() call into filename_lookup() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:40 -04:00
Al Viro	625b6d1054	namei: pass the struct path to store the result down into path_lookupat() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:39 -04:00
Al Viro	18d8c86011	namei: uninline set_root{,_rcu}() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:39 -04:00
Al Viro	aed434ada6	namei: be careful with mountpoint crossings in follow_dotdot_rcu() Otherwise we are risking a hard error where nonlazy restart would be the right thing to do; it's a very narrow race with mount --move and most of the time it ends up being completely harmless, but it's possible to construct a case when we'll get a bogus hard error instead of falling back to non-lazy walk... For one thing, when crossing _into_ overmount of parent we need to check for mount_lock bumps when we get NULL from __lookup_mnt() as well. For another, and less exotically, we need to make sure that the data fetched in follow_up_rcu() had been consistent. ->mnt_mountpoint is pinned for as long as it is a mountpoint, but we need to check mount_lock after fetching to verify that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:38 -04:00
Al Viro	5a8d87e8ed	namei: unlazy_walk() doesn't need to mess with current->fs anymore now that we have ->root_seq, legitimize_path(&nd->root, nd->root_seq) will do just fine... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:36 -04:00
Al Viro	8f47a0167c	namei: handle absolute symlinks without dropping out of RCU mode Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:10:22 -04:00
Al Viro	8c1b456689	enable passing fast relative symlinks without dropping out of RCU mode Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:06:28 -04:00
NeilBrown	8fa9dd2466	VFS/namei: make the use of touch_atime() in get_link() RCU-safe. touch_atime is not RCU-safe, and so cannot be called on an RCU walk. However, in situations where RCU-walk makes a difference, the symlink will likely to accessed much more often than it is useful to update the atime. So split out the test of "Does the atime actually need to be updated" into atime_needs_update(), and have get_link() unlazy if it finds that it will need to do that update. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:06:27 -04:00
Al Viro	bc40aee053	namei: don't unlazy until get_link() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:06:27 -04:00
Al Viro	7973387a2f	namei: make unlazy_walk and terminate_walk handle nd->stack, add unlazy_link We are almost done - primitives for leaving RCU mode are aware of nd->stack now, a new primitive for going to non-RCU mode when we have a symlink on hands added. The thing we are heavily relying upon is that any unlazy failure will be shortly followed by terminate_walk(), with no access to nameidata in between. So it's enough to leave the things in a state terminate_walk() would cope with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-15 01:06:01 -04:00
Al Viro	0450b2d120	namei: store seq numbers in nd->stack[] we'll need them for unlazy_walk() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:13:14 -04:00
Al Viro	31956502dd	namei: make may_follow_link() safe in RCU mode We can't call that audit garbage in RCU mode - it's doing a weird mix of allocations (GFP_NOFS, immediately followed by GFP_KERNEL) and I'm not touching that... thing again. So if this security sclero^Whardening feature gets triggered when we are in RCU mode, tough - we'll fail with -ECHILD and have everything restarted in non-RCU mode. Only to hit the same test and fail, this time with EACCES and with (oh, rapture) an audit spew produced. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:13:13 -04:00
Al Viro	6548fae2ec	namei: make put_link() RCU-safe very simple - just make path_put() conditional on !RCU. Note that right now it doesn't get called in RCU mode - we leave it before getting anything into stack. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:13:13 -04:00
Al Viro	5f2c4179e1	switch ->put_link() from dentry to inode only one instance looks at that argument at all; that sole exception wants inode rather than dentry. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:13:12 -04:00
NeilBrown	bda0be7ad9	security: make inode_follow_link RCU-walk aware inode_follow_link now takes an inode and rcu flag as well as the dentry. inode is used in preference to d_backing_inode(dentry), particularly in RCU-walk mode. selinux_inode_follow_link() gets dentry_has_perm() and inode_has_perm() open-coded into it so that it can call avc_has_perm_flags() in way that is safe if LOOKUP_RCU is set. Calling avc_has_perm_flags() with rcu_read_lock() held means that when avc_has_perm_noaudit calls avc_compute_av(), the attempt to rcu_read_unlock() before calling security_compute_av() will not actually drop the RCU read-lock. However as security_compute_av() is completely in a read_lock()ed region, it should be safe with the RCU read-lock held. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:13:11 -04:00
Al Viro	181548c051	namei: pick_link() callers already have inode no need to refetch (and once we move unlazy out of there, recheck ->d_seq). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:13:10 -04:00
David Howells	63afdfc781	VFS: Handle lower layer dentry/inode in pathwalk Make use of d_backing_inode() in pathwalk to gain access to an inode or dentry that's on a lower layer. Signed-off-by: David Howells <dhowells@redhat.com>	2015-05-11 08:13:10 -04:00
Al Viro	237d8b327a	namei: store inode in nd->stack[] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:13:09 -04:00
Al Viro	254cf58212	namei: don't mangle nd->seq in lookup_fast() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:13:09 -04:00
Al Viro	6e9918b7b3	namei: explicitly pass seq number to unlazy_walk() when dentry != NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:13:09 -04:00
Al Viro	3595e2346c	link_path_walk: use explicit returns for failure exits Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:13:08 -04:00
Al Viro	deb106c632	namei: lift terminate_walk() all the way up Lift it from link_path_walk(), trailing_symlink(), lookup_last(), mountpoint_last(), complete_walk() and do_last(). A _lot_ of those suckers merge. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:13:08 -04:00
Al Viro	3bdba28b72	namei: lift link_path_walk() call out of trailing_symlink() Make trailing_symlink() return the pathname to traverse or ERR_PTR(-E...). A subtle point is that for "magic" symlinks it returns "" now - that leads to link_path_walk("", nd), which is immediately returning 0 and we are back to the treatment of the last component, at whereever the damn thing has left us. Reduces the stack footprint - link_path_walk() called on more shallow stack now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:12:57 -04:00
Al Viro	368ee9ba56	namei: path_init() calling conventions change * lift link_path_walk() into callers; moving it down into path_init() had been a mistake. Stack footprint, among other things... * do _not_ call path_cleanup() after path_init() failure; on all failure exits out of it we have nothing for path_cleanup() to do * have path_init() return pathname or ERR_PTR(-E...) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:10:41 -04:00
Al Viro	34a26b99b7	namei: get rid of nameidata->base we can do fdput() under rcu_read_lock() just fine; all we need to take care of is fetching nd->inode value first. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:05:05 -04:00
Al Viro	8bcb77fabd	namei: split off filename_lookupat() with LOOKUP_PARENT new functions: filename_parentat() and path_parentat() resp. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-10 22:20:20 -04:00
Al Viro	b5cd339762	namei: may_follow_link() - lift terminate_walk() on failures into caller Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-10 22:20:20 -04:00
Al Viro	ab10492345	namei: take increment of nd->depth into pick_link() Makes the situation much more regular - we avoid a strange state when the element just after the top of stack is used to store struct path of symlink, but isn't counted in nd->depth. This is much more regular, so the normal failure exits, etc., work fine. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-10 22:20:19 -04:00
Al Viro	1cf2665b5b	namei: kill nd->link Just store it in nd->stack[nd->depth].link right in pick_link(). Now that we make sure of stack expansion in pick_link(), we can do so... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-10 22:20:19 -04:00
Al Viro	fec2fa24e8	may_follow_link(): trim arguments Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-10 22:20:18 -04:00
Al Viro	cd179f4468	namei: move bumping the refcount of link->mnt into pick_link() update the failure cleanup in may_follow_link() to match that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-10 22:20:18 -04:00
Al Viro	e8bb73dfb0	namei: fold put_link() into the failure case of complete_walk() ... and don't open-code unlazy_walk() in there - the only reason for that is to avoid verfication of cached nd->root, which is trivially avoided by discarding said cached nd->root first. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-10 22:20:17 -04:00
Al Viro	fab51e8ab2	namei: take the treatment of absolute symlinks to get_link() rather than letting the callers handle the jump-to-root part of semantics, do it right in get_link() and return the rest of the body for the caller to deal with - at that point it's treated the same way as relative symlinks would be. And return NULL when there's no "rest of the body" - those are treated the same as pure jump symlink would be. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-10 22:20:17 -04:00
Al Viro	4f697a5e17	namei: simpler treatment of symlinks with nothing other that / in the body Instead of saving name and branching to OK:, where we'll immediately restore it, and call walk_component() with WALK_PUT\|WALK_GET and nd->last_type being LAST_BIND, which is equivalent to put_link(nd), err = 0, we can just treat that the same way we'd treat procfs-style "jump" symlinks - do put_link(nd) and move on. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-10 22:20:16 -04:00

1 2 3 4 5 ...

921 Commits