From 2fa6b1e01a9b1a54769c394f06cd72c3d12a2d48 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 12 Nov 2019 16:13:06 -0500 Subject: [PATCH] fs/namei.c: fix missing barriers when checking positivity Pinned negative dentries can, generally, be made positive by another thread. Conditions that prevent that are * ->d_lock on dentry in question * parent directory held at least shared * nobody else could have observed the address of dentry Most of the places working with those fall into one of those categories; however, d_lookup() and friends need to be used with some care. Fortunately, there's not a lot of call sites, and with few exceptions all of those fall under one of the cases above. Exceptions are all in fs/namei.c - in lookup_fast(), lookup_dcache() and mountpoint_last(). Another one is lookup_slow() - there dcache lookup is done with parent held shared, but the result is used after we'd drop the lock. The same happens in do_last() - the lookup (in lookup_one()) is done with parent locked, but result is used after unlocking. lookup_fast(), do_last() and mountpoint_last() flat-out reject negatives. Most of lookup_dcache() calls are made with parent locked at least shared; the only exception is lookup_one_len_unlocked(). It might return pinned negative, needs serious care from callers. Fortunately, almost nobody calls it directly anymore; all but two callers have converted to lookup_positive_unlocked(), which rejects negatives. lookup_slow() is called by the same lookup_one_len_unlocked() (see above), mountpoint_last() and walk_component(). In those two negatives are rejected. In other words, there is a small set of places where we need to check carefully if a pinned potentially negative dentry is, in fact, positive. After that check we want to be sure that both ->d_inode and type bits in ->d_flags are stable and observed. The set consists of follow_managed() (where the rejection happens for lookup_fast(), walk_component() and do_last()), last_mountpoint() and lookup_positive_unlocked(). Solution: 1) transition from negative to positive (in __d_set_inode_and_type()) stores ->d_inode, then uses smp_store_release() to set ->d_flags type bits. 2) aforementioned 3 places in fs/namei.c fetch ->d_flags with smp_load_acquire() and bugger off if it type bits say "negative". That way anyone downstream of those checks has dentry know positive pinned, with ->d_inode and type bits of ->d_flags stable and observed. I considered splitting off d_lookup_positive(), so that the checks could be done right there, under ->d_lock. However, that leads to massive duplication of rather subtle code in fs/namei.c and fs/dcache.c. It's worse than it might seem, thanks to autofs ->d_manage() getting involved ;-/ No matter what, autofs_d_manage()/autofs_d_automount() must live with the possibility of pinned negative dentry passed their way, becoming positive under them - that's the intended behaviour when lookup comes in the middle of automount in progress, so we can't keep them out of the area that has to deal with those, more's the pity... Reported-by: Ritesh Harjani Signed-off-by: Al Viro --- fs/dcache.c | 2 +- fs/namei.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index b2a7f1765f0b..a6d6b5f95f62 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -319,7 +319,7 @@ static inline void __d_set_inode_and_type(struct dentry *dentry, flags = READ_ONCE(dentry->d_flags); flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); flags |= type_flags; - WRITE_ONCE(dentry->d_flags, flags); + smp_store_release(&dentry->d_flags, flags); } static inline void __d_clear_type_and_inode(struct dentry *dentry) diff --git a/fs/namei.c b/fs/namei.c index 6f72fb7ef5ad..117950657e63 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1220,7 +1220,7 @@ static int follow_managed(struct path *path, struct nameidata *nd) /* Given that we're not holding a lock here, we retain the value in a * local variable for each dentry as we look at it so that we don't see * the components of that value change under us */ - while (flags = READ_ONCE(path->dentry->d_flags), + while (flags = smp_load_acquire(&path->dentry->d_flags), unlikely(flags & DCACHE_MANAGED_DENTRY)) { /* Allow the filesystem to manage the transit without i_mutex * being held. */ @@ -2569,7 +2569,7 @@ struct dentry *lookup_positive_unlocked(const char *name, struct dentry *base, int len) { struct dentry *ret = lookup_one_len_unlocked(name, base, len); - if (!IS_ERR(ret) && d_is_negative(ret)) { + if (!IS_ERR(ret) && d_flags_negative(smp_load_acquire(&ret->d_flags))) { dput(ret); ret = ERR_PTR(-ENOENT); } @@ -2671,7 +2671,7 @@ mountpoint_last(struct nameidata *nd) return PTR_ERR(path.dentry); } } - if (d_is_negative(path.dentry)) { + if (d_flags_negative(smp_load_acquire(&path.dentry->d_flags))) { dput(path.dentry); return -ENOENT; }