OpenCloudOS-Kernel/fs
Oleg Nesterov 498f237178 mempolicy: fix show_numa_map() vs exec() + do_set_mempolicy() race
9e7814404b "hold task->mempolicy while numa_maps scans." fixed the
race with the exiting task but this is not enough.

The current code assumes that get_vma_policy(task) should either see
task->mempolicy == NULL or it should be equal to ->task_mempolicy saved
by hold_task_mempolicy(), so we can never race with __mpol_put(). But
this can only work if we can't race with do_set_mempolicy(), and thus
we can't race with another do_set_mempolicy() or do_exit() after that.

However, do_set_mempolicy()->down_write(mmap_sem) can not prevent this
race. This task can exec, change it's ->mm, and call do_set_mempolicy()
after that; in this case they take 2 different locks.

Change hold_task_mempolicy() to use get_task_policy(), it never returns
NULL, and change show_numa_map() to use __get_vma_policy() or fall back
to proc_priv->task_mempolicy.

Note: this is the minimal fix, we will cleanup this code later. I think
hold_task_mempolicy() and release_task_mempolicy() should die, we can
move this logic into show_numa_map(). Or we can move get_task_policy()
outside of ->mmap_sem and !CONFIG_NUMA code at least.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:56 -04:00
..
9p Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
adfs adfs: add __printf verification, fix format/argument mismatches 2014-08-08 15:57:24 -07:00
affs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
afs AFS: Correctly assemble the client UUID 2014-07-29 10:14:36 -07:00
autofs4 autofs4: comment typo: remove a a doubled word 2014-08-08 15:57:19 -07:00
befs fs/befs/linuxvfs.c: check superblock before dump operation 2014-08-08 15:57:20 -07:00
bfs fs/bfs: use bfs prefix for dump_imap 2014-08-08 15:57:24 -07:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-10-07 21:16:26 -04:00
cachefiles fs/cachefiles: add missing \n to kerror conversions 2014-09-26 08:10:35 -07:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2014-08-13 17:43:29 -06:00
cifs CIFS: Fix readpages retrying on reconnects 2014-10-02 14:17:41 -05:00
coda fs/coda: use linux/uaccess.h 2014-08-08 15:57:20 -07:00
configfs fs/configfs: use pr_fmt 2014-06-04 16:53:53 -07:00
cramfs fs/cramfs/inode.c: use linux/uaccess.h 2014-08-08 15:57:25 -07:00
debugfs fs: debugfs: remove trailing whitespace 2014-07-09 16:58:21 -07:00
devpts fs/devpts/inode.c: convert printk to pr_foo() 2014-06-06 16:08:14 -07:00
dlm fs/dlm/debug_fs.c: remove unnecessary null test before debugfs_remove 2014-08-08 15:57:27 -07:00
ecryptfs write_iter variants of {__,}generic_file_aio_write() 2014-05-06 17:38:00 -04:00
efivarfs fs/efivarfs/super.c: use static const for dentry_operations 2014-06-04 16:54:14 -07:00
efs fs/efs/namei.c: return is not a function 2014-08-08 15:57:18 -07:00
exofs fs/exofs/ore_raid.c: replace count*size kzalloc by kcalloc 2014-08-08 15:57:24 -07:00
exportfs fs/exportfs/expfs.c: kernel-doc warning fixes 2014-06-04 16:54:14 -07:00
ext2 fs/ext2/super.c: Drop memory allocation cast 2014-07-15 22:40:22 +02:00
ext3 ext3: Count internal journal as bsddf overhead in ext3_statfs 2014-08-19 23:16:51 +02:00
ext4 ext4: avoid trying to kfree an ERR_PTR pointer 2014-09-03 09:37:30 -04:00
f2fs f2fs: support volatile operations for transient data 2014-10-07 11:54:41 -07:00
fat Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
freevxfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
fscache FS-Cache: refcount becomes corrupt under vma pressure. 2014-09-17 22:41:40 +01:00
fuse fuse: honour max_read and max_write in direct_io mode 2014-09-26 21:16:51 -04:00
gfs2 GFS2: fix d_splice_alias() misuses 2014-09-12 20:58:55 +01:00
hfs write_iter variants of {__,}generic_file_aio_write() 2014-05-06 17:38:00 -04:00
hfsplus Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
hostfs hostfs: support rename flags 2014-08-07 14:40:09 -04:00
hpfs fs/hpfs/dnode.c: fix suspect code indent 2014-08-08 15:57:22 -07:00
hppfs
hugetlbfs fs/hugetlbfs/inode.c: remove null test before kfree 2014-06-04 16:54:11 -07:00
isofs isofs: Fix unbounded recursion when processing relocated directories 2014-08-19 18:29:30 +02:00
jbd fs/jbd/revoke.c: replace shift loop by ilog2 2014-05-21 10:26:13 +02:00
jbd2 jbd2: fix descriptor block size handling errors with journal_csum 2014-08-28 22:22:29 -04:00
jffs2 MTD updates for 3.17-rc1 2014-08-08 18:13:21 -07:00
jfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
kernfs Merge 3.16-rc6 into driver-core-next 2014-07-21 10:07:25 -07:00
lockd Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux 2014-10-08 12:51:44 -04:00
logfs fs/logfs/readwrite.c: kernel-doc warning fixes 2014-08-06 18:01:12 -07:00
minix minix zmap block counts calculation fix 2014-08-08 15:57:20 -07:00
ncpfs fs/ncpfs/getopt.c: replace simple_strtoul by kstrtoul 2014-06-04 16:54:21 -07:00
nfs Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux 2014-10-08 12:51:44 -04:00
nfs_common lockd: move lockd's grace period handling into its own module 2014-09-17 16:33:11 -04:00
nfsd Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux 2014-10-08 12:51:44 -04:00
nilfs2 nilfs2: fix data loss with mmap() 2014-09-26 08:10:34 -07:00
nls
notify fanotify: enable close-on-exec on events' fd when requested in fanotify_init() 2014-10-09 22:25:46 -04:00
ntfs ntfs: remove bogus space 2014-10-09 22:25:46 -04:00
ocfs2 ocfs2: fix deadlock due to wrong locking order 2014-10-09 22:25:48 -04:00
omfs fs/omfs/inode.c: replace count*size kzalloc by kcalloc 2014-08-08 15:57:25 -07:00
openpromfs fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
proc mempolicy: fix show_numa_map() vs exec() + do_set_mempolicy() race 2014-10-09 22:25:56 -04:00
pstore fs/pstore/ram_core.c: replace count*size kmalloc by kmalloc_array 2014-08-08 15:57:25 -07:00
qnx4 fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
qnx6 fs/qnx6: update debugging to current functions 2014-08-08 15:57:26 -07:00
quota fs/quota: kernel-doc warning fixes 2014-07-15 22:40:23 +02:00
ramfs fs/ramfs/file-nommu.c: replace count*size kzalloc by kcalloc 2014-08-08 15:57:18 -07:00
reiserfs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2014-08-13 17:45:40 -06:00
romfs fs/romfs/super.c: add blank line after declarations 2014-08-08 15:57:25 -07:00
squashfs fs/squashfs/super.c: logging cleanup 2014-08-06 18:01:13 -07:00
sysfs kernfs: move the last knowledge of sysfs out from kernfs 2014-06-03 08:11:18 -07:00
sysv write_iter variants of {__,}generic_file_aio_write() 2014-05-06 17:38:00 -04:00
ubifs UBIFS: Add log overlap assertions 2014-07-31 15:52:51 +03:00
udf udf: saner calling conventions for udf_new_inode() 2014-09-04 21:37:41 +02:00
ufs ufs: deal with nfsd/iget races 2014-09-26 21:17:52 -04:00
xfs xfs: trim eofblocks before collapse range 2014-09-02 12:12:53 +10:00
Kconfig lockd: move lockd's grace period handling into its own module 2014-09-17 16:33:11 -04:00
Kconfig.binfmt
Makefile take fs_pin stuff to fs/* 2014-08-07 14:40:08 -04:00
aio.c aio: block exit_aio() until all context requests are completed 2014-09-04 16:54:47 -04:00
anon_inodes.c vfs: Allocate anon_inode_inode in anon_inode_init() 2014-03-27 09:52:54 -07:00
attr.c fs,userns: Change inode_capable to capable_wrt_inode_uidgid 2014-06-10 13:57:22 -07:00
bad_inode.c bad_inode: add ->rename2() 2014-08-07 14:40:09 -04:00
binfmt_aout.c
binfmt_elf.c Merge branch 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next 2014-06-05 08:05:29 -07:00
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c fs/binfmt_flat.c: make old_reloc() static 2014-06-04 16:54:21 -07:00
binfmt_misc.c binfmt_misc: add missing 'break' statement 2014-04-03 16:21:16 -07:00
binfmt_script.c
binfmt_som.c
block_dev.c block_dev: implement readpages() to optimize sequential read 2014-10-09 22:25:53 -04:00
buffer.c vfs: guard end of device for mpage interface 2014-10-09 22:25:53 -04:00
char_dev.c
compat.c locks: rename file-private locks to "open file description locks" 2014-04-22 08:23:58 -04:00
compat_binfmt_elf.c binfmt_elf: add ELF_HWCAP2 to compat auxv entries 2014-03-04 08:05:21 +00:00
compat_ioctl.c Bluetooth: Move HCI socket definitions into its own header file 2014-07-11 13:53:04 +03:00
coredump.c coredump: fix the setting of PF_DUMPCORE 2014-07-23 15:10:54 -07:00
dcache.c vfs: Don't exchange "short" filenames unconditionally. 2014-09-27 15:59:39 -04:00
dcookies.c
direct-io.c fuse: honour max_read and max_write in direct_io mode 2014-09-26 21:16:51 -04:00
drop_caches.c fs: convert use of typedef ctl_table to struct ctl_table 2014-06-06 16:08:16 -07:00
eventfd.c
eventpoll.c eventpoll: fix uninitialized variable in epoll_ctl 2014-09-10 15:42:12 -07:00
exec.c fork/exec: cleanup mm initialization 2014-08-08 15:57:23 -07:00
fcntl.c shm: add sealing API 2014-08-08 15:57:31 -07:00
fhandle.c
file.c fs/file.c: don't open-code kvfree() 2014-05-06 17:31:10 -04:00
file_table.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
filesystems.c sys_sysfs: Add CONFIG_SYSFS_SYSCALL 2014-04-03 16:21:05 -07:00
fs-writeback.c sched: Remove proliferation of wait_on_bit() action functions 2014-07-16 15:10:39 +02:00
fs_pin.c make fs/{namespace,super}.c forget about acct.h 2014-08-07 14:40:09 -04:00
fs_struct.c
inode.c mm: allow drivers to prevent new writable mappings 2014-08-08 15:57:31 -07:00
internal.h vfs: guard end of device for mpage interface 2014-10-09 22:25:53 -04:00
ioctl.c
libfs.c fs/libfs.c: add generic data flush to fsync 2014-06-04 16:53:55 -07:00
locks.c locks: pass correct "before" pointer to locks_unlink_lock in generic_add_lease 2014-08-22 09:58:22 -04:00
mbcache.c fs/mbcache: replace __builtin_log2() with ilog2() 2014-06-25 22:08:29 -04:00
mount.h death to mnt_pinned 2014-08-07 14:40:09 -04:00
mpage.c vfs: guard end of device for mpage interface 2014-10-09 22:25:53 -04:00
namei.c vfs: workaround gcc <4.6 build error in link_path_walk() 2014-09-16 07:44:54 -07:00
namespace.c fix EBUSY on umount() from MNT_SHRINKABLE 2014-08-30 18:32:05 -04:00
no-block.c
open.c vfs: fix check for fallocate on active swapfile 2014-08-01 02:36:04 -04:00
pipe.c new helper: copy_page_from_iter() 2014-05-06 17:39:42 -04:00
pnode.c get rid of propagate_umount() mistakenly treating slaves as busy. 2014-08-30 18:31:41 -04:00
pnode.h smarter propagate_mnt() 2014-04-01 23:19:08 -04:00
posix_acl.c posix_acl: handle NULL ACL in posix_acl_equiv_mode 2014-05-06 13:58:42 -04:00
proc_namespace.c namespaces: Use task_lock and not rcu to protect nsproxy 2014-07-29 18:08:50 -07:00
read_write.c switch simple generic_file_aio_read() users to ->read_iter() 2014-05-06 17:37:55 -04:00
readdir.c fanotify: create FAN_ACCESS event for readdir 2014-06-04 16:53:52 -07:00
select.c
seq_file.c fs/seq_file: fallback to vmalloc allocation 2014-07-03 09:21:54 -07:00
signalfd.c
splice.c Merge commit '9f12600fe425bc28f0ccba034a77783c09c15af4' into for-linus 2014-06-12 00:28:09 -04:00
stack.c fs: fix comment for 'CONFIG_LBADF' 2014-08-26 09:35:56 +02:00
stat.c
statfs.c
super.c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2014-08-13 17:45:40 -06:00
sync.c Export sync_filesystem() for modular ->remount_fs() use 2014-09-05 08:16:21 -07:00
timerfd.c timerfd: Remove an always true check 2014-08-27 11:17:48 +02:00
utimes.c
xattr.c simple_xattr: permit 0-size extended attributes 2014-07-23 15:10:55 -07:00