linux-sg2042/fs
Al Viro 8767712f26 rmdir(),rename(): do shrink_dcache_parent() only on success
Once upon a time ->rmdir() instances used to check if victim inode
had more than one (in-core) reference and failed with -EBUSY if it
had.  The reason was race avoidance - emptiness check is worthless
if somebody could just go and create new objects in the victim
directory afterwards.

With introduction of dcache the checks had been replaced with
checking the refcount of dentry.  However, since a cached negative
lookup leaves a negative child dentry, such check had lead to false
positives - with empty foo/ doing stat foo/bar before rmdir foo
ended up with -EBUSY unless the negative dentry of foo/bar happened
to be evicted by the time of rmdir(2).  That had been fixed by
doing shrink_dcache_parent() just before the refcount check.

At the same time, ext2_rmdir() has grown a private solution that
eliminated those -EBUSY - it did something (setting ->i_size to 0)
which made any subsequent ext2_add_entry() fail.

Unfortunately, even with shrink_dcache_parent() the check had been
racy - after all, the victim itself could be found by dcache lookup
just after we'd checked its refcount.  That got fixed by a new
helper (dentry_unhash()) that did shrink_dcache_parent() and unhashed
the sucker if its refcount ended up equal to 1.  That got called before
->rmdir(), turning the checks in ->rmdir() instances into "if not
unhashed fail with -EBUSY".  Which reduced the boilerplate nicely, but
had an unpleasant side effect - now shrink_dcache_parent() had been
done before the emptiness checks, leading to easily triggerable calls
of shrink_dcache_parent() on arbitrary large subtrees, quite possibly
nested into each other.

Several years later the ext2-private trick had been generalized -
(in-core) inodes of dead directories are flagged and calls of
lookup, readdir and all directory-modifying methods were prevented
in so marked directories.  Remaining boilerplate in ->rmdir() instances
became redundant and some instances got rid of it.

In 2011 the call of dentry_unhash() got shifted into ->rmdir() instances
and then killed off in all of them.  That has lead to another problem,
though - in case of successful rmdir we *want* any (negative) child
dentries dropped and the victim itself made negative.  There's no point
keeping cached negative lookups in foo when we can get the negative
lookup of foo itself cached.  So shrink_dcache_parent() call had been
restored; unfortunately, it went into the place where dentry_unhash()
used to be, i.e. before the ->rmdir() call.  Note that we don't unhash
anymore, so any "is it busy" checks would be racy; fortunately, all of
them are gone.

We should've done that call right *after* successful ->rmdir().  That
reduces contention caused by tree-walking in shrink_dcache_parent()
and, especially, contention caused by evictions in two nested subtrees
going on in parallel.  The same goes for directory-overwriting rename() -
the story there had been parallel to that of rmdir().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-27 16:23:51 -04:00
..
9p fscache development 2018-04-07 09:08:24 -07:00
adfs Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
affs affs_lookup: switch to d_splice_alias() 2018-05-21 14:29:12 -04:00
afs Merge branch 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-04-12 11:59:06 -07:00
autofs4 autofs4: use wait_event_killable 2018-04-11 10:28:36 -07:00
befs befs_lookup(): use d_splice_alias() 2018-05-21 14:30:07 -04:00
bfs
btrfs do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
cachefiles cachefiles: vfs_mkdir() might succeed leaving dentry negative unhashed 2018-05-21 14:30:10 -04:00
ceph The big ticket items are: 2018-04-10 12:25:30 -07:00
cifs cifs: change validate_buf to validate_iov 2018-04-12 20:32:55 -05:00
coda vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
configfs
cramfs cramfs: Fix IS_ENABLED typo 2018-05-21 14:30:08 -04:00
crypto fscrypt: fix build with pre-4.6 gcc versions 2018-02-01 10:51:18 -05:00
debugfs debugfs_lookup(): switch to lookup_one_len_unlocked() 2018-03-29 15:07:47 -04:00
devpts devpts: comment devpts_mntget() 2018-03-14 13:31:23 +01:00
dlm net: make getname() functions return length rather than use int* parameter 2018-02-12 14:15:04 -05:00
ecryptfs do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
efivarfs efivarfs: Limit the rate for non-root to read files 2018-02-22 10:21:02 -08:00
efs Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
exofs iversion.h related cleanup for v4.16 2018-02-07 14:25:22 -08:00
exportfs ovl: do not try to reconnect a disconnected origin dentry 2018-04-12 12:04:49 +02:00
ext2 ext2: fix a block leak 2018-05-21 14:30:11 -04:00
ext4 do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
f2fs do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
fat iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw} 2018-02-01 08:15:25 -05:00
freevxfs vxfs: Define usercopy region in vxfs_inode slab cache 2018-01-15 12:07:57 -08:00
fscache fscache: use appropriate radix tree accessors 2018-04-11 10:28:39 -07:00
fuse fuse: define the filesystem as untrusted 2018-03-23 06:31:37 -04:00
gfs2 GFS2: Minor improvements to comments and documentation 2018-04-12 10:07:51 -07:00
hfs Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
hfsplus hfsplus: honor setgid flag on directories 2018-02-06 18:32:45 -08:00
hostfs hostfs: rename do_rmdir() to hostfs_do_rmdir() 2018-04-02 20:15:53 +02:00
hpfs hpfs: don't bother with the i_version counter or f_version 2017-12-10 12:58:18 -08:00
hugetlbfs hugetlbfs: fix bug in pgoff overflow checking 2018-04-05 21:36:21 -07:00
isofs Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
jbd2 jbd2: if the journal is aborted then don't allow update of the log tail 2018-02-19 12:22:53 -05:00
jffs2 do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
jfs do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
kernfs kernfs: deal with kernfs_fill_super() failures 2018-05-21 14:30:08 -04:00
lockd net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
minix treewide: simplify Kconfig dependencies for removed archs 2018-03-26 15:55:57 +02:00
nfs NFS client updates for Linux 4.17 2018-04-12 12:55:50 -07:00
nfs_common net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
nfsd nfsd: vfs_mkdir() might succeed leaving dentry negative unhashed 2018-05-21 14:30:10 -04:00
nilfs2 do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
nls
notify Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2018-04-05 19:17:50 -07:00
ntfs ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call 2018-03-28 01:39:02 -04:00
ocfs2 Merge branch 'akpm' (patches from Andrew) 2018-04-06 14:19:26 -07:00
omfs
openpromfs Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
orangefs do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
overlayfs ovl: add support for "xino" mount and config options 2018-04-12 12:04:50 +02:00
proc proc: revalidate misc dentries 2018-04-13 17:10:27 -07:00
pstore pstore: fix crypto dependencies without compression 2018-04-06 15:45:33 -07:00
qnx4 Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
qnx6 Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
quota fs/quota: use COMPAT_SYSCALL_DEFINE for sys32_quotactl() 2018-04-02 20:15:47 +02:00
ramfs
reiserfs do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
romfs Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
squashfs Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
sysfs unfuck sysfs_mount() 2018-05-21 14:30:09 -04:00
sysv Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
tracefs
ubifs This pull request contains updates for both UBI and UBIFS: 2018-04-11 16:39:34 -07:00
udf do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
ufs do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
xfs Changes since last update: 2018-04-12 13:28:22 -07:00
Kconfig libnvdimm for 4.16 2018-02-06 10:41:33 -08:00
Kconfig.binfmt treewide: simplify Kconfig dependencies for removed archs 2018-03-26 15:55:57 +02:00
Makefile split d_path() and friends into a separate file 2018-03-29 15:07:46 -04:00
aio.c fix io_destroy()/aio_complete() race 2018-05-23 22:53:22 -04:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c exec: introduce finalize_exec() before start_thread() 2018-04-11 10:28:37 -07:00
binfmt_elf.c elf: enforce MAP_FIXED on overlaying elf segments 2018-04-11 10:28:38 -07:00
binfmt_elf_fdpic.c exec: introduce finalize_exec() before start_thread() 2018-04-11 10:28:37 -07:00
binfmt_em86.c
binfmt_flat.c exec: introduce finalize_exec() before start_thread() 2018-04-11 10:28:37 -07:00
binfmt_misc.c fs: add ksys_close() wrapper; remove in-kernel calls to sys_close() 2018-04-02 20:16:00 +02:00
binfmt_script.c
block_dev.c libnvdimm for 4.17 2018-04-10 10:25:57 -07:00
buffer.c Merge branch 'work.thaw' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-04-12 12:28:32 -07:00
char_dev.c block, char_dev: Use correct format specifier for unsigned ints 2018-03-15 17:59:24 +01:00
compat.c
compat_binfmt_elf.c
compat_ioctl.c fs: compat_ioctl: add new DVB demux ioctls 2017-12-28 11:17:29 -05:00
coredump.c
d_path.c split d_path() and friends into a separate file 2018-03-29 15:07:46 -04:00
dax.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
dcache.c do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
dcookies.c fs: add do_lookup_dcookie() helper; remove in-kernel call to syscall 2018-04-02 20:15:39 +02:00
direct-io.c Merge branch 'akpm' (patches from Andrew) 2018-04-06 14:19:26 -07:00
drop_caches.c
eventfd.c fs: add do_eventfd() helper; remove internal call to sys_eventfd() 2018-04-02 20:15:39 +02:00
eventpoll.c fs: add do_epoll_*() helpers; remove internal calls to sys_epoll_*() 2018-04-02 20:15:37 +02:00
exec.c exec: pin stack limit during exec 2018-04-11 10:28:37 -07:00
fcntl.c fs: add do_compat_fcntl64() helper; remove in-kernel call to compat syscall 2018-04-02 20:15:42 +02:00
fhandle.c vfs: Copy struct mount.mnt_id to userspace using put_user() 2018-01-15 12:07:51 -08:00
file.c fs: add ksys_close() wrapper; remove in-kernel calls to sys_close() 2018-04-02 20:16:00 +02:00
file_table.c vfs: remove unused hardirq.h 2017-12-07 14:23:30 -05:00
filesystems.c
fs-writeback.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
fs_pin.c
fs_struct.c
inode.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
internal.h Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-04-06 11:07:08 -07:00
ioctl.c fs: add ksys_ioctl() helper; remove in-kernel calls to sys_ioctl() 2018-04-02 20:16:03 +02:00
iomap.c iomap: warn on zero-length mappings 2018-01-29 07:27:24 -08:00
libfs.c fs, dax: prepare for dax-specific address_space_operations 2018-03-30 11:34:55 -07:00
locks.c treewide: Align function definition open/close braces 2018-03-26 11:13:09 +02:00
mbcache.c mbcache: make sure c_entry_count is not decremented past zero 2018-01-09 23:57:52 -05:00
mount.h
mpage.c
namei.c rmdir(),rename(): do shrink_dcache_parent() only on success 2018-05-27 16:23:51 -04:00
namespace.c Don't leak MNT_INTERNAL away from internal mounts 2018-04-19 23:52:15 -04:00
no-block.c
nsfs.c net: Export open_related_ns() 2018-02-15 15:34:42 -05:00
open.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-04-06 11:07:08 -07:00
pipe.c fs: add do_pipe2() helper; remove internal call to sys_pipe2() 2018-04-02 20:15:35 +02:00
pnode.c
pnode.h
posix_acl.c posix_acl: convert posix_acl.a_refcount from atomic_t to refcount_t 2018-01-02 19:27:28 -08:00
proc_namespace.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
read_write.c fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls 2018-04-02 20:16:09 +02:00
readdir.c fs: add ksys_getdents64() helper; remove in-kernel calls to sys_getdents64() 2018-04-02 20:16:02 +02:00
select.c fs: add do_compat_select() helper; remove in-kernel call to compat syscall 2018-04-02 20:15:42 +02:00
seq_file.c seq_file: account everything to kmemcg 2018-04-11 10:28:36 -07:00
signalfd.c fs: add do_compat_signalfd4() helper; remove in-kernel call to compat syscall 2018-04-02 20:15:43 +02:00
splice.c fs: add do_vmsplice() helper; remove in-kernel call to syscall 2018-04-02 20:15:40 +02:00
stack.c
stat.c fs: add do_readlinkat() helper; remove internal call to sys_readlinkat() 2018-04-02 20:15:34 +02:00
statfs.c Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
super.c fs: don't scan the inode cache before SB_BORN is set 2018-05-11 15:37:57 -04:00
sync.c Changes for this release: 2018-04-04 12:44:02 -07:00
timerfd.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
userfaultfd.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
utimes.c fs: add do_compat_futimesat() helper; remove in-kernel call to compat syscall 2018-04-02 20:15:44 +02:00
xattr.c