OpenCloudOS-Kernel/ipc/namespace.c

259 lines
5.8 KiB
C
Raw Normal View History

License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
// SPDX-License-Identifier: GPL-2.0
namespaces: move the IPC namespace under IPC_NS option Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 20:18:22 +08:00
/*
* linux/ipc/namespace.c
* Copyright (C) 2006 Pavel Emelyanov <xemul@openvz.org> OpenVZ, SWsoft Inc.
*/
#include <linux/ipc.h>
#include <linux/msg.h>
#include <linux/ipc_namespace.h>
#include <linux/rcupdate.h>
#include <linux/nsproxy.h>
#include <linux/slab.h>
#include <linux/cred.h>
namespaces: ipc namespaces: implement support for posix msqueues Implement multiple mounts of the mqueue file system, and link it to usage of CLONE_NEWIPC. Each ipc ns has a corresponding mqueuefs superblock. When a user does clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an internal mount of a new mqueuefs sb linked to the new ipc ns. When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the mqueuefs superblock. Posix message queues can be worked with both through the mq_* system calls (see mq_overview(7)), and through the VFS through the mqueue mount. Any usage of mq_open() and friends will work with the acting task's ipc namespace. Any actions through the VFS will work with the mqueuefs in which the file was created. So if a user doesn't remount mqueuefs after unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls /dev/mqueue". If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns, ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1) ipc_ns:1 will be freed, (2) it's superblock will live on until task b umounts the corresponding mqueuefs, and vfs actions will continue to succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to the deceased ipc_ns:1. To make this happen, we must protect the ipc reference count when a) a task exits and drops its ipcns->count, since it might be dropping it to 0 and freeing the ipcns b) a task accesses the ipcns through its mqueuefs interface, since it bumps the ipcns refcount and might race with the last task in the ipcns exiting. So the kref is changed to an atomic_t so we can use atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns through ns = mqueuefs_sb->s_fs_info is protected by the same lock. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 10:01:10 +08:00
#include <linux/fs.h>
#include <linux/mount.h>
#include <linux/user_namespace.h>
#include <linux/proc_ns.h>
#include <linux/sched/task.h>
namespaces: move the IPC namespace under IPC_NS option Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 20:18:22 +08:00
#include "util.h"
/*
* The work queue is used to avoid the cost of synchronize_rcu in kern_unmount.
*/
static void free_ipc(struct work_struct *unused);
static DECLARE_WORK(free_ipc_work, free_ipc);
static struct ucounts *inc_ipc_namespaces(struct user_namespace *ns)
{
return inc_ucount(ns, current_euid(), UCOUNT_IPC_NAMESPACES);
}
static void dec_ipc_namespaces(struct ucounts *ucounts)
{
dec_ucount(ucounts, UCOUNT_IPC_NAMESPACES);
}
static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
struct ipc_namespace *old_ns)
namespaces: move the IPC namespace under IPC_NS option Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 20:18:22 +08:00
{
struct ipc_namespace *ns;
struct ucounts *ucounts;
namespaces: ipc namespaces: implement support for posix msqueues Implement multiple mounts of the mqueue file system, and link it to usage of CLONE_NEWIPC. Each ipc ns has a corresponding mqueuefs superblock. When a user does clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an internal mount of a new mqueuefs sb linked to the new ipc ns. When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the mqueuefs superblock. Posix message queues can be worked with both through the mq_* system calls (see mq_overview(7)), and through the VFS through the mqueue mount. Any usage of mq_open() and friends will work with the acting task's ipc namespace. Any actions through the VFS will work with the mqueuefs in which the file was created. So if a user doesn't remount mqueuefs after unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls /dev/mqueue". If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns, ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1) ipc_ns:1 will be freed, (2) it's superblock will live on until task b umounts the corresponding mqueuefs, and vfs actions will continue to succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to the deceased ipc_ns:1. To make this happen, we must protect the ipc reference count when a) a task exits and drops its ipcns->count, since it might be dropping it to 0 and freeing the ipcns b) a task accesses the ipcns through its mqueuefs interface, since it bumps the ipcns refcount and might race with the last task in the ipcns exiting. So the kref is changed to an atomic_t so we can use atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns through ns = mqueuefs_sb->s_fs_info is protected by the same lock. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 10:01:10 +08:00
int err;
namespaces: move the IPC namespace under IPC_NS option Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 20:18:22 +08:00
err = -ENOSPC;
again:
ucounts = inc_ipc_namespaces(user_ns);
if (!ucounts) {
/*
* IPC namespaces are freed asynchronously, by free_ipc_work.
* If frees were pending, flush_work will wait, and
* return true. Fail the allocation if no frees are pending.
*/
if (flush_work(&free_ipc_work))
goto again;
goto fail;
}
err = -ENOMEM;
memcg: enable accounting for new namesapces and struct nsproxy Container admin can create new namespaces and force kernel to allocate up to several pages of memory for the namespaces and its associated structures. Net and uts namespaces have enabled accounting for such allocations. It makes sense to account for rest ones to restrict the host's memory consumption from inside the memcg-limited container. Link: https://lkml.kernel.org/r/5525bcbf-533e-da27-79b7-158686c64e13@virtuozzo.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Yutian Yang <nglaive@gmail.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03 05:55:27 +08:00
ns = kzalloc(sizeof(struct ipc_namespace), GFP_KERNEL_ACCOUNT);
namespaces: move the IPC namespace under IPC_NS option Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 20:18:22 +08:00
if (ns == NULL)
goto fail_dec;
namespaces: move the IPC namespace under IPC_NS option Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 20:18:22 +08:00
err = ns_alloc_inum(&ns->ns);
if (err)
goto fail_free;
ns->ns.ops = &ipcns_operations;
ipc: Use generic ns_common::count Switch over ipc namespaces to use the newly introduced common lifetime counter. Currently every namespace type has its own lifetime counter which is stored in the specific namespace struct. The lifetime counters are used identically for all namespaces types. Namespaces may of course have additional unrelated counters and these are not altered. This introduces a common lifetime counter into struct ns_common. The ns_common struct encompasses information that all namespaces share. That should include the lifetime counter since its common for all of them. It also allows us to unify the type of the counters across all namespaces. Most of them use refcount_t but one uses atomic_t and at least one uses kref. Especially the last one doesn't make much sense since it's just a wrapper around refcount_t since 2016 and actually complicates cleanup operations by having to use container_of() to cast the correct namespace struct out of struct ns_common. Having the lifetime counter for the namespaces in one place reduces maintenance cost. Not just because after switching all namespaces over we will have removed more code than we added but also because the logic is more easily understandable and we indicate to the user that the basic lifetime requirements for all namespaces are currently identical. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/159644978697.604812.16592754423881032385.stgit@localhost.localdomain Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-08-03 18:16:27 +08:00
refcount_set(&ns->ns.count, 1);
ns->user_ns = get_user_ns(user_ns);
ns->ucounts = ucounts;
err = mq_init_ns(ns);
if (err)
goto fail_put;
err = -ENOMEM;
if (!setup_mq_sysctls(ns))
goto fail_put;
if (!setup_ipc_sysctls(ns))
ipc: Free mq_sysctls if ipc namespace creation failed The problem that Dmitry Vyukov pointed out is that if setup_ipc_sysctls fails, mq_sysctls must be freed before return. executing program BUG: memory leak unreferenced object 0xffff888112fc9200 (size 512): comm "syz-executor237", pid 3648, jiffies 4294970469 (age 12.270s) hex dump (first 32 bytes): ef d3 60 85 ff ff ff ff 0c 9b d2 12 81 88 ff ff ..`............. 04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff814b6eb3>] kmemdup+0x23/0x50 mm/util.c:129 [<ffffffff82219a9b>] kmemdup include/linux/fortify-string.h:456 [inline] [<ffffffff82219a9b>] setup_mq_sysctls+0x4b/0x1c0 ipc/mq_sysctl.c:89 [<ffffffff822197f2>] create_ipc_ns ipc/namespace.c:63 [inline] [<ffffffff822197f2>] copy_ipcs+0x292/0x390 ipc/namespace.c:91 [<ffffffff8127de7c>] create_new_namespaces+0xdc/0x4f0 kernel/nsproxy.c:90 [<ffffffff8127e89b>] unshare_nsproxy_namespaces+0x9b/0x120 kernel/nsproxy.c:226 [<ffffffff8123f92e>] ksys_unshare+0x2fe/0x600 kernel/fork.c:3165 [<ffffffff8123fc42>] __do_sys_unshare kernel/fork.c:3236 [inline] [<ffffffff8123fc42>] __se_sys_unshare kernel/fork.c:3234 [inline] [<ffffffff8123fc42>] __x64_sys_unshare+0x12/0x20 kernel/fork.c:3234 [<ffffffff845aab45>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff845aab45>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff8460006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 BUG: memory leak unreferenced object 0xffff888112fd5f00 (size 256): comm "syz-executor237", pid 3648, jiffies 4294970469 (age 12.270s) hex dump (first 32 bytes): 00 92 fc 12 81 88 ff ff 00 00 00 00 01 00 00 00 ................ 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff816fea1b>] kmalloc include/linux/slab.h:605 [inline] [<ffffffff816fea1b>] kzalloc include/linux/slab.h:733 [inline] [<ffffffff816fea1b>] __register_sysctl_table+0x7b/0x7f0 fs/proc/proc_sysctl.c:1344 [<ffffffff82219b7a>] setup_mq_sysctls+0x12a/0x1c0 ipc/mq_sysctl.c:112 [<ffffffff822197f2>] create_ipc_ns ipc/namespace.c:63 [inline] [<ffffffff822197f2>] copy_ipcs+0x292/0x390 ipc/namespace.c:91 [<ffffffff8127de7c>] create_new_namespaces+0xdc/0x4f0 kernel/nsproxy.c:90 [<ffffffff8127e89b>] unshare_nsproxy_namespaces+0x9b/0x120 kernel/nsproxy.c:226 [<ffffffff8123f92e>] ksys_unshare+0x2fe/0x600 kernel/fork.c:3165 [<ffffffff8123fc42>] __do_sys_unshare kernel/fork.c:3236 [inline] [<ffffffff8123fc42>] __se_sys_unshare kernel/fork.c:3234 [inline] [<ffffffff8123fc42>] __x64_sys_unshare+0x12/0x20 kernel/fork.c:3234 [<ffffffff845aab45>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff845aab45>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff8460006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 BUG: memory leak unreferenced object 0xffff888112fbba00 (size 256): comm "syz-executor237", pid 3648, jiffies 4294970469 (age 12.270s) hex dump (first 32 bytes): 78 ba fb 12 81 88 ff ff 00 00 00 00 01 00 00 00 x............... 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff816fef49>] kmalloc include/linux/slab.h:605 [inline] [<ffffffff816fef49>] kzalloc include/linux/slab.h:733 [inline] [<ffffffff816fef49>] new_dir fs/proc/proc_sysctl.c:978 [inline] [<ffffffff816fef49>] get_subdir fs/proc/proc_sysctl.c:1022 [inline] [<ffffffff816fef49>] __register_sysctl_table+0x5a9/0x7f0 fs/proc/proc_sysctl.c:1373 [<ffffffff82219b7a>] setup_mq_sysctls+0x12a/0x1c0 ipc/mq_sysctl.c:112 [<ffffffff822197f2>] create_ipc_ns ipc/namespace.c:63 [inline] [<ffffffff822197f2>] copy_ipcs+0x292/0x390 ipc/namespace.c:91 [<ffffffff8127de7c>] create_new_namespaces+0xdc/0x4f0 kernel/nsproxy.c:90 [<ffffffff8127e89b>] unshare_nsproxy_namespaces+0x9b/0x120 kernel/nsproxy.c:226 [<ffffffff8123f92e>] ksys_unshare+0x2fe/0x600 kernel/fork.c:3165 [<ffffffff8123fc42>] __do_sys_unshare kernel/fork.c:3236 [inline] [<ffffffff8123fc42>] __se_sys_unshare kernel/fork.c:3234 [inline] [<ffffffff8123fc42>] __x64_sys_unshare+0x12/0x20 kernel/fork.c:3234 [<ffffffff845aab45>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff845aab45>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff8460006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 BUG: memory leak unreferenced object 0xffff888112fbb900 (size 256): comm "syz-executor237", pid 3648, jiffies 4294970469 (age 12.270s) hex dump (first 32 bytes): 78 b9 fb 12 81 88 ff ff 00 00 00 00 01 00 00 00 x............... 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff816fef49>] kmalloc include/linux/slab.h:605 [inline] [<ffffffff816fef49>] kzalloc include/linux/slab.h:733 [inline] [<ffffffff816fef49>] new_dir fs/proc/proc_sysctl.c:978 [inline] [<ffffffff816fef49>] get_subdir fs/proc/proc_sysctl.c:1022 [inline] [<ffffffff816fef49>] __register_sysctl_table+0x5a9/0x7f0 fs/proc/proc_sysctl.c:1373 [<ffffffff82219b7a>] setup_mq_sysctls+0x12a/0x1c0 ipc/mq_sysctl.c:112 [<ffffffff822197f2>] create_ipc_ns ipc/namespace.c:63 [inline] [<ffffffff822197f2>] copy_ipcs+0x292/0x390 ipc/namespace.c:91 [<ffffffff8127de7c>] create_new_namespaces+0xdc/0x4f0 kernel/nsproxy.c:90 [<ffffffff8127e89b>] unshare_nsproxy_namespaces+0x9b/0x120 kernel/nsproxy.c:226 [<ffffffff8123f92e>] ksys_unshare+0x2fe/0x600 kernel/fork.c:3165 [<ffffffff8123fc42>] __do_sys_unshare kernel/fork.c:3236 [inline] [<ffffffff8123fc42>] __se_sys_unshare kernel/fork.c:3234 [inline] [<ffffffff8123fc42>] __x64_sys_unshare+0x12/0x20 kernel/fork.c:3234 [<ffffffff845aab45>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff845aab45>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff8460006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Reported-by: syzbot+b4b0d1b35442afbf6fd2@syzkaller.appspotmail.com Signed-off-by: Alexey Gladkov <legion@kernel.org> Link: https://lkml.kernel.org/r/000000000000f5004705e1db8bad@google.com Link: https://lkml.kernel.org/r/20220622200729.2639663-1-legion@kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2022-06-23 04:07:29 +08:00
goto fail_mq;
ipc/msg: mitigate the lock contention with percpu counter The msg_bytes and msg_hdrs atomic counters are frequently updated when IPC msg queue is in heavy use, causing heavy cache bounce and overhead. Change them to percpu_counter greatly improve the performance. Since there is one percpu struct per namespace, additional memory cost is minimal. Reading of the count done in msgctl call, which is infrequent. So the need to sum up the counts in each CPU is infrequent. Apply the patch and test the pts/stress-ng-1.4.0 -- system v message passing (160 threads). Score gain: 3.99x CPU: ICX 8380 x 2 sockets Core number: 40 x 2 physical cores Benchmark: pts/stress-ng-1.4.0 -- system v message passing (160 threads) [akpm@linux-foundation.org: coding-style cleanups] [jiebin.sun@intel.com: avoid negative value by overflow in msginfo] Link: https://lkml.kernel.org/r/20220920150809.4014944-1-jiebin.sun@intel.com [akpm@linux-foundation.org: fix min() warnings] Link: https://lkml.kernel.org/r/20220913192538.3023708-3-jiebin.sun@intel.com Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vasily Averin <vasily.averin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-14 03:25:38 +08:00
err = msg_init_ns(ns);
if (err)
goto fail_put;
sem_init_ns(ns);
shm_init_ns(ns);
namespaces: move the IPC namespace under IPC_NS option Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 20:18:22 +08:00
return ns;
ipc: Free mq_sysctls if ipc namespace creation failed The problem that Dmitry Vyukov pointed out is that if setup_ipc_sysctls fails, mq_sysctls must be freed before return. executing program BUG: memory leak unreferenced object 0xffff888112fc9200 (size 512): comm "syz-executor237", pid 3648, jiffies 4294970469 (age 12.270s) hex dump (first 32 bytes): ef d3 60 85 ff ff ff ff 0c 9b d2 12 81 88 ff ff ..`............. 04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff814b6eb3>] kmemdup+0x23/0x50 mm/util.c:129 [<ffffffff82219a9b>] kmemdup include/linux/fortify-string.h:456 [inline] [<ffffffff82219a9b>] setup_mq_sysctls+0x4b/0x1c0 ipc/mq_sysctl.c:89 [<ffffffff822197f2>] create_ipc_ns ipc/namespace.c:63 [inline] [<ffffffff822197f2>] copy_ipcs+0x292/0x390 ipc/namespace.c:91 [<ffffffff8127de7c>] create_new_namespaces+0xdc/0x4f0 kernel/nsproxy.c:90 [<ffffffff8127e89b>] unshare_nsproxy_namespaces+0x9b/0x120 kernel/nsproxy.c:226 [<ffffffff8123f92e>] ksys_unshare+0x2fe/0x600 kernel/fork.c:3165 [<ffffffff8123fc42>] __do_sys_unshare kernel/fork.c:3236 [inline] [<ffffffff8123fc42>] __se_sys_unshare kernel/fork.c:3234 [inline] [<ffffffff8123fc42>] __x64_sys_unshare+0x12/0x20 kernel/fork.c:3234 [<ffffffff845aab45>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff845aab45>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff8460006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 BUG: memory leak unreferenced object 0xffff888112fd5f00 (size 256): comm "syz-executor237", pid 3648, jiffies 4294970469 (age 12.270s) hex dump (first 32 bytes): 00 92 fc 12 81 88 ff ff 00 00 00 00 01 00 00 00 ................ 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff816fea1b>] kmalloc include/linux/slab.h:605 [inline] [<ffffffff816fea1b>] kzalloc include/linux/slab.h:733 [inline] [<ffffffff816fea1b>] __register_sysctl_table+0x7b/0x7f0 fs/proc/proc_sysctl.c:1344 [<ffffffff82219b7a>] setup_mq_sysctls+0x12a/0x1c0 ipc/mq_sysctl.c:112 [<ffffffff822197f2>] create_ipc_ns ipc/namespace.c:63 [inline] [<ffffffff822197f2>] copy_ipcs+0x292/0x390 ipc/namespace.c:91 [<ffffffff8127de7c>] create_new_namespaces+0xdc/0x4f0 kernel/nsproxy.c:90 [<ffffffff8127e89b>] unshare_nsproxy_namespaces+0x9b/0x120 kernel/nsproxy.c:226 [<ffffffff8123f92e>] ksys_unshare+0x2fe/0x600 kernel/fork.c:3165 [<ffffffff8123fc42>] __do_sys_unshare kernel/fork.c:3236 [inline] [<ffffffff8123fc42>] __se_sys_unshare kernel/fork.c:3234 [inline] [<ffffffff8123fc42>] __x64_sys_unshare+0x12/0x20 kernel/fork.c:3234 [<ffffffff845aab45>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff845aab45>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff8460006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 BUG: memory leak unreferenced object 0xffff888112fbba00 (size 256): comm "syz-executor237", pid 3648, jiffies 4294970469 (age 12.270s) hex dump (first 32 bytes): 78 ba fb 12 81 88 ff ff 00 00 00 00 01 00 00 00 x............... 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff816fef49>] kmalloc include/linux/slab.h:605 [inline] [<ffffffff816fef49>] kzalloc include/linux/slab.h:733 [inline] [<ffffffff816fef49>] new_dir fs/proc/proc_sysctl.c:978 [inline] [<ffffffff816fef49>] get_subdir fs/proc/proc_sysctl.c:1022 [inline] [<ffffffff816fef49>] __register_sysctl_table+0x5a9/0x7f0 fs/proc/proc_sysctl.c:1373 [<ffffffff82219b7a>] setup_mq_sysctls+0x12a/0x1c0 ipc/mq_sysctl.c:112 [<ffffffff822197f2>] create_ipc_ns ipc/namespace.c:63 [inline] [<ffffffff822197f2>] copy_ipcs+0x292/0x390 ipc/namespace.c:91 [<ffffffff8127de7c>] create_new_namespaces+0xdc/0x4f0 kernel/nsproxy.c:90 [<ffffffff8127e89b>] unshare_nsproxy_namespaces+0x9b/0x120 kernel/nsproxy.c:226 [<ffffffff8123f92e>] ksys_unshare+0x2fe/0x600 kernel/fork.c:3165 [<ffffffff8123fc42>] __do_sys_unshare kernel/fork.c:3236 [inline] [<ffffffff8123fc42>] __se_sys_unshare kernel/fork.c:3234 [inline] [<ffffffff8123fc42>] __x64_sys_unshare+0x12/0x20 kernel/fork.c:3234 [<ffffffff845aab45>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff845aab45>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff8460006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 BUG: memory leak unreferenced object 0xffff888112fbb900 (size 256): comm "syz-executor237", pid 3648, jiffies 4294970469 (age 12.270s) hex dump (first 32 bytes): 78 b9 fb 12 81 88 ff ff 00 00 00 00 01 00 00 00 x............... 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff816fef49>] kmalloc include/linux/slab.h:605 [inline] [<ffffffff816fef49>] kzalloc include/linux/slab.h:733 [inline] [<ffffffff816fef49>] new_dir fs/proc/proc_sysctl.c:978 [inline] [<ffffffff816fef49>] get_subdir fs/proc/proc_sysctl.c:1022 [inline] [<ffffffff816fef49>] __register_sysctl_table+0x5a9/0x7f0 fs/proc/proc_sysctl.c:1373 [<ffffffff82219b7a>] setup_mq_sysctls+0x12a/0x1c0 ipc/mq_sysctl.c:112 [<ffffffff822197f2>] create_ipc_ns ipc/namespace.c:63 [inline] [<ffffffff822197f2>] copy_ipcs+0x292/0x390 ipc/namespace.c:91 [<ffffffff8127de7c>] create_new_namespaces+0xdc/0x4f0 kernel/nsproxy.c:90 [<ffffffff8127e89b>] unshare_nsproxy_namespaces+0x9b/0x120 kernel/nsproxy.c:226 [<ffffffff8123f92e>] ksys_unshare+0x2fe/0x600 kernel/fork.c:3165 [<ffffffff8123fc42>] __do_sys_unshare kernel/fork.c:3236 [inline] [<ffffffff8123fc42>] __se_sys_unshare kernel/fork.c:3234 [inline] [<ffffffff8123fc42>] __x64_sys_unshare+0x12/0x20 kernel/fork.c:3234 [<ffffffff845aab45>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff845aab45>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff8460006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Reported-by: syzbot+b4b0d1b35442afbf6fd2@syzkaller.appspotmail.com Signed-off-by: Alexey Gladkov <legion@kernel.org> Link: https://lkml.kernel.org/r/000000000000f5004705e1db8bad@google.com Link: https://lkml.kernel.org/r/20220622200729.2639663-1-legion@kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2022-06-23 04:07:29 +08:00
fail_mq:
retire_mq_sysctls(ns);
fail_put:
put_user_ns(ns->user_ns);
ns_free_inum(&ns->ns);
fail_free:
kfree(ns);
fail_dec:
dec_ipc_namespaces(ucounts);
fail:
return ERR_PTR(err);
namespaces: move the IPC namespace under IPC_NS option Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 20:18:22 +08:00
}
struct ipc_namespace *copy_ipcs(unsigned long flags,
struct user_namespace *user_ns, struct ipc_namespace *ns)
namespaces: move the IPC namespace under IPC_NS option Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 20:18:22 +08:00
{
if (!(flags & CLONE_NEWIPC))
return get_ipc_ns(ns);
return create_ipc_ns(user_ns, ns);
namespaces: move the IPC namespace under IPC_NS option Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 20:18:22 +08:00
}
/*
* free_ipcs - free all ipcs of one type
* @ns: the namespace to remove the ipcs from
* @ids: the table of ipcs to free
* @free: the function called to free each individual ipc
*
* Called for each kind of ipc when an ipc_namespace exits.
*/
void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids,
void (*free)(struct ipc_namespace *, struct kern_ipc_perm *))
{
struct kern_ipc_perm *perm;
int next_id;
int total, in_use;
down_write(&ids->rwsem);
in_use = ids->in_use;
for (total = 0, next_id = 0; total < in_use; next_id++) {
perm = idr_find(&ids->ipcs_idr, next_id);
if (perm == NULL)
continue;
rcu_read_lock();
ipc_lock_object(perm);
free(ns, perm);
total++;
}
up_write(&ids->rwsem);
}
static void free_ipc_ns(struct ipc_namespace *ns)
{
/*
* Caller needs to wait for an RCU grace period to have passed
* after making the mount point inaccessible to new accesses.
*/
mntput(ns->mq_mnt);
sem_exit_ns(ns);
msg_exit_ns(ns);
shm_exit_ns(ns);
retire_mq_sysctls(ns);
retire_ipc_sysctls(ns);
dec_ipc_namespaces(ns->ucounts);
put_user_ns(ns->user_ns);
ns_free_inum(&ns->ns);
kfree(ns);
}
static LLIST_HEAD(free_ipc_list);
static void free_ipc(struct work_struct *unused)
{
struct llist_node *node = llist_del_all(&free_ipc_list);
struct ipc_namespace *n, *t;
llist_for_each_entry_safe(n, t, node, mnt_llist)
mnt_make_shortterm(n->mq_mnt);
/* Wait for any last users to have gone away. */
synchronize_rcu();
llist_for_each_entry_safe(n, t, node, mnt_llist)
free_ipc_ns(n);
}
namespaces: ipc namespaces: implement support for posix msqueues Implement multiple mounts of the mqueue file system, and link it to usage of CLONE_NEWIPC. Each ipc ns has a corresponding mqueuefs superblock. When a user does clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an internal mount of a new mqueuefs sb linked to the new ipc ns. When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the mqueuefs superblock. Posix message queues can be worked with both through the mq_* system calls (see mq_overview(7)), and through the VFS through the mqueue mount. Any usage of mq_open() and friends will work with the acting task's ipc namespace. Any actions through the VFS will work with the mqueuefs in which the file was created. So if a user doesn't remount mqueuefs after unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls /dev/mqueue". If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns, ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1) ipc_ns:1 will be freed, (2) it's superblock will live on until task b umounts the corresponding mqueuefs, and vfs actions will continue to succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to the deceased ipc_ns:1. To make this happen, we must protect the ipc reference count when a) a task exits and drops its ipcns->count, since it might be dropping it to 0 and freeing the ipcns b) a task accesses the ipcns through its mqueuefs interface, since it bumps the ipcns refcount and might race with the last task in the ipcns exiting. So the kref is changed to an atomic_t so we can use atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns through ns = mqueuefs_sb->s_fs_info is protected by the same lock. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 10:01:10 +08:00
/*
* put_ipc_ns - drop a reference to an ipc namespace.
* @ns: the namespace to put
*
* If this is the last task in the namespace exiting, and
* it is dropping the refcount to 0, then it can race with
* a task in another ipc namespace but in a mounts namespace
* which has this ipcns's mqueuefs mounted, doing some action
* with one of the mqueuefs files. That can raise the refcount.
* So dropping the refcount, and raising the refcount when
* accessing it through the VFS, are protected with mq_lock.
*
* (Clearly, a task raising the refcount on its own ipc_ns
* needn't take mq_lock since it can't race with the last task
* in the ipcns exiting).
*/
void put_ipc_ns(struct ipc_namespace *ns)
namespaces: move the IPC namespace under IPC_NS option Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 20:18:22 +08:00
{
ipc: Use generic ns_common::count Switch over ipc namespaces to use the newly introduced common lifetime counter. Currently every namespace type has its own lifetime counter which is stored in the specific namespace struct. The lifetime counters are used identically for all namespaces types. Namespaces may of course have additional unrelated counters and these are not altered. This introduces a common lifetime counter into struct ns_common. The ns_common struct encompasses information that all namespaces share. That should include the lifetime counter since its common for all of them. It also allows us to unify the type of the counters across all namespaces. Most of them use refcount_t but one uses atomic_t and at least one uses kref. Especially the last one doesn't make much sense since it's just a wrapper around refcount_t since 2016 and actually complicates cleanup operations by having to use container_of() to cast the correct namespace struct out of struct ns_common. Having the lifetime counter for the namespaces in one place reduces maintenance cost. Not just because after switching all namespaces over we will have removed more code than we added but also because the logic is more easily understandable and we indicate to the user that the basic lifetime requirements for all namespaces are currently identical. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/159644978697.604812.16592754423881032385.stgit@localhost.localdomain Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-08-03 18:16:27 +08:00
if (refcount_dec_and_lock(&ns->ns.count, &mq_lock)) {
namespaces: ipc namespaces: implement support for posix msqueues Implement multiple mounts of the mqueue file system, and link it to usage of CLONE_NEWIPC. Each ipc ns has a corresponding mqueuefs superblock. When a user does clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an internal mount of a new mqueuefs sb linked to the new ipc ns. When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the mqueuefs superblock. Posix message queues can be worked with both through the mq_* system calls (see mq_overview(7)), and through the VFS through the mqueue mount. Any usage of mq_open() and friends will work with the acting task's ipc namespace. Any actions through the VFS will work with the mqueuefs in which the file was created. So if a user doesn't remount mqueuefs after unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls /dev/mqueue". If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns, ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1) ipc_ns:1 will be freed, (2) it's superblock will live on until task b umounts the corresponding mqueuefs, and vfs actions will continue to succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to the deceased ipc_ns:1. To make this happen, we must protect the ipc reference count when a) a task exits and drops its ipcns->count, since it might be dropping it to 0 and freeing the ipcns b) a task accesses the ipcns through its mqueuefs interface, since it bumps the ipcns refcount and might race with the last task in the ipcns exiting. So the kref is changed to an atomic_t so we can use atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns through ns = mqueuefs_sb->s_fs_info is protected by the same lock. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 10:01:10 +08:00
mq_clear_sbinfo(ns);
spin_unlock(&mq_lock);
if (llist_add(&ns->mnt_llist, &free_ipc_list))
schedule_work(&free_ipc_work);
namespaces: ipc namespaces: implement support for posix msqueues Implement multiple mounts of the mqueue file system, and link it to usage of CLONE_NEWIPC. Each ipc ns has a corresponding mqueuefs superblock. When a user does clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an internal mount of a new mqueuefs sb linked to the new ipc ns. When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the mqueuefs superblock. Posix message queues can be worked with both through the mq_* system calls (see mq_overview(7)), and through the VFS through the mqueue mount. Any usage of mq_open() and friends will work with the acting task's ipc namespace. Any actions through the VFS will work with the mqueuefs in which the file was created. So if a user doesn't remount mqueuefs after unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls /dev/mqueue". If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns, ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1) ipc_ns:1 will be freed, (2) it's superblock will live on until task b umounts the corresponding mqueuefs, and vfs actions will continue to succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to the deceased ipc_ns:1. To make this happen, we must protect the ipc reference count when a) a task exits and drops its ipcns->count, since it might be dropping it to 0 and freeing the ipcns b) a task accesses the ipcns through its mqueuefs interface, since it bumps the ipcns refcount and might race with the last task in the ipcns exiting. So the kref is changed to an atomic_t so we can use atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns through ns = mqueuefs_sb->s_fs_info is protected by the same lock. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 10:01:10 +08:00
}
}
static inline struct ipc_namespace *to_ipc_ns(struct ns_common *ns)
{
return container_of(ns, struct ipc_namespace, ns);
}
static struct ns_common *ipcns_get(struct task_struct *task)
{
struct ipc_namespace *ns = NULL;
struct nsproxy *nsproxy;
task_lock(task);
nsproxy = task->nsproxy;
if (nsproxy)
ns = get_ipc_ns(nsproxy->ipc_ns);
task_unlock(task);
return ns ? &ns->ns : NULL;
}
static void ipcns_put(struct ns_common *ns)
{
return put_ipc_ns(to_ipc_ns(ns));
}
nsproxy: add struct nsset Add a simple struct nsset. It holds all necessary pieces to switch to a new set of namespaces without leaving a task in a half-switched state which we will make use of in the next patch. This patch switches the existing setns logic over without causing a change in setns() behavior. This brings setns() closer to how unshare() works(). The prepare_ns() function is responsible to prepare all necessary information. This has two reasons. First it minimizes dependencies between individual namespaces, i.e. all install handler can expect that all fields are properly initialized independent in what order they are called in. Second, this makes the code easier to maintain and easier to follow if it needs to be changed. The prepare_ns() helper will only be switched over to use a flags argument in the next patch. Here it will still use nstype as a simple integer argument which was argued would be clearer. I'm not particularly opinionated about this if it really helps or not. The struct nsset itself already contains the flags field since its name already indicates that it can contain information required by different namespaces. None of this should have functional consequences. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Jann Horn <jannh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/r/20200505140432.181565-2-christian.brauner@ubuntu.com
2020-05-05 22:04:30 +08:00
static int ipcns_install(struct nsset *nsset, struct ns_common *new)
{
nsproxy: add struct nsset Add a simple struct nsset. It holds all necessary pieces to switch to a new set of namespaces without leaving a task in a half-switched state which we will make use of in the next patch. This patch switches the existing setns logic over without causing a change in setns() behavior. This brings setns() closer to how unshare() works(). The prepare_ns() function is responsible to prepare all necessary information. This has two reasons. First it minimizes dependencies between individual namespaces, i.e. all install handler can expect that all fields are properly initialized independent in what order they are called in. Second, this makes the code easier to maintain and easier to follow if it needs to be changed. The prepare_ns() helper will only be switched over to use a flags argument in the next patch. Here it will still use nstype as a simple integer argument which was argued would be clearer. I'm not particularly opinionated about this if it really helps or not. The struct nsset itself already contains the flags field since its name already indicates that it can contain information required by different namespaces. None of this should have functional consequences. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Jann Horn <jannh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/r/20200505140432.181565-2-christian.brauner@ubuntu.com
2020-05-05 22:04:30 +08:00
struct nsproxy *nsproxy = nsset->nsproxy;
struct ipc_namespace *ns = to_ipc_ns(new);
if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
nsproxy: add struct nsset Add a simple struct nsset. It holds all necessary pieces to switch to a new set of namespaces without leaving a task in a half-switched state which we will make use of in the next patch. This patch switches the existing setns logic over without causing a change in setns() behavior. This brings setns() closer to how unshare() works(). The prepare_ns() function is responsible to prepare all necessary information. This has two reasons. First it minimizes dependencies between individual namespaces, i.e. all install handler can expect that all fields are properly initialized independent in what order they are called in. Second, this makes the code easier to maintain and easier to follow if it needs to be changed. The prepare_ns() helper will only be switched over to use a flags argument in the next patch. Here it will still use nstype as a simple integer argument which was argued would be clearer. I'm not particularly opinionated about this if it really helps or not. The struct nsset itself already contains the flags field since its name already indicates that it can contain information required by different namespaces. None of this should have functional consequences. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Jann Horn <jannh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/r/20200505140432.181565-2-christian.brauner@ubuntu.com
2020-05-05 22:04:30 +08:00
!ns_capable(nsset->cred->user_ns, CAP_SYS_ADMIN))
return -EPERM;
put_ipc_ns(nsproxy->ipc_ns);
nsproxy->ipc_ns = get_ipc_ns(ns);
return 0;
}
static struct user_namespace *ipcns_owner(struct ns_common *ns)
{
return to_ipc_ns(ns)->user_ns;
}
const struct proc_ns_operations ipcns_operations = {
.name = "ipc",
.type = CLONE_NEWIPC,
.get = ipcns_get,
.put = ipcns_put,
.install = ipcns_install,
.owner = ipcns_owner,
};