License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2005-04-17 06:20:36 +08:00
|
|
|
/*
|
|
|
|
* This is <linux/capability.h>
|
|
|
|
*
|
Implement file posix capabilities
Implement file posix capabilities. This allows programs to be given a
subset of root's powers regardless of who runs them, without having to use
setuid and giving the binary all of root's powers.
This version works with Kaigai Kohei's userspace tools, found at
http://www.kaigai.gr.jp/index.php. For more information on how to use this
patch, Chris Friedhoff has posted a nice page at
http://www.friedhoff.org/fscaps.html.
Changelog:
Nov 27:
Incorporate fixes from Andrew Morton
(security-introduce-file-caps-tweaks and
security-introduce-file-caps-warning-fix)
Fix Kconfig dependency.
Fix change signaling behavior when file caps are not compiled in.
Nov 13:
Integrate comments from Alexey: Remove CONFIG_ ifdef from
capability.h, and use %zd for printing a size_t.
Nov 13:
Fix endianness warnings by sparse as suggested by Alexey
Dobriyan.
Nov 09:
Address warnings of unused variables at cap_bprm_set_security
when file capabilities are disabled, and simultaneously clean
up the code a little, by pulling the new code into a helper
function.
Nov 08:
For pointers to required userspace tools and how to use
them, see http://www.friedhoff.org/fscaps.html.
Nov 07:
Fix the calculation of the highest bit checked in
check_cap_sanity().
Nov 07:
Allow file caps to be enabled without CONFIG_SECURITY, since
capabilities are the default.
Hook cap_task_setscheduler when !CONFIG_SECURITY.
Move capable(TASK_KILL) to end of cap_task_kill to reduce
audit messages.
Nov 05:
Add secondary calls in selinux/hooks.c to task_setioprio and
task_setscheduler so that selinux and capabilities with file
cap support can be stacked.
Sep 05:
As Seth Arnold points out, uid checks are out of place
for capability code.
Sep 01:
Define task_setscheduler, task_setioprio, cap_task_kill, and
task_setnice to make sure a user cannot affect a process in which
they called a program with some fscaps.
One remaining question is the note under task_setscheduler: are we
ok with CAP_SYS_NICE being sufficient to confine a process to a
cpuset?
It is a semantic change, as without fsccaps, attach_task doesn't
allow CAP_SYS_NICE to override the uid equivalence check. But since
it uses security_task_setscheduler, which elsewhere is used where
CAP_SYS_NICE can be used to override the uid equivalence check,
fixing it might be tough.
task_setscheduler
note: this also controls cpuset:attach_task. Are we ok with
CAP_SYS_NICE being used to confine to a cpuset?
task_setioprio
task_setnice
sys_setpriority uses this (through set_one_prio) for another
process. Need same checks as setrlimit
Aug 21:
Updated secureexec implementation to reflect the fact that
euid and uid might be the same and nonzero, but the process
might still have elevated caps.
Aug 15:
Handle endianness of xattrs.
Enforce capability version match between kernel and disk.
Enforce that no bits beyond the known max capability are
set, else return -EPERM.
With this extra processing, it may be worth reconsidering
doing all the work at bprm_set_security rather than
d_instantiate.
Aug 10:
Always call getxattr at bprm_set_security, rather than
caching it at d_instantiate.
[morgan@kernel.org: file-caps clean up for linux/capability.h]
[bunk@kernel.org: unexport cap_inode_killpriv]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 14:31:36 +08:00
|
|
|
* Andrew G. Morgan <morgan@kernel.org>
|
2005-04-17 06:20:36 +08:00
|
|
|
* Alexander Kjeldaas <astor@guardian.no>
|
|
|
|
* with help from Aleph1, Roland Buresund and Andrew Main.
|
|
|
|
*
|
|
|
|
* See here for the libcap library ("POSIX draft" compliance):
|
|
|
|
*
|
2009-06-16 16:26:25 +08:00
|
|
|
* ftp://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/
|
Implement file posix capabilities
Implement file posix capabilities. This allows programs to be given a
subset of root's powers regardless of who runs them, without having to use
setuid and giving the binary all of root's powers.
This version works with Kaigai Kohei's userspace tools, found at
http://www.kaigai.gr.jp/index.php. For more information on how to use this
patch, Chris Friedhoff has posted a nice page at
http://www.friedhoff.org/fscaps.html.
Changelog:
Nov 27:
Incorporate fixes from Andrew Morton
(security-introduce-file-caps-tweaks and
security-introduce-file-caps-warning-fix)
Fix Kconfig dependency.
Fix change signaling behavior when file caps are not compiled in.
Nov 13:
Integrate comments from Alexey: Remove CONFIG_ ifdef from
capability.h, and use %zd for printing a size_t.
Nov 13:
Fix endianness warnings by sparse as suggested by Alexey
Dobriyan.
Nov 09:
Address warnings of unused variables at cap_bprm_set_security
when file capabilities are disabled, and simultaneously clean
up the code a little, by pulling the new code into a helper
function.
Nov 08:
For pointers to required userspace tools and how to use
them, see http://www.friedhoff.org/fscaps.html.
Nov 07:
Fix the calculation of the highest bit checked in
check_cap_sanity().
Nov 07:
Allow file caps to be enabled without CONFIG_SECURITY, since
capabilities are the default.
Hook cap_task_setscheduler when !CONFIG_SECURITY.
Move capable(TASK_KILL) to end of cap_task_kill to reduce
audit messages.
Nov 05:
Add secondary calls in selinux/hooks.c to task_setioprio and
task_setscheduler so that selinux and capabilities with file
cap support can be stacked.
Sep 05:
As Seth Arnold points out, uid checks are out of place
for capability code.
Sep 01:
Define task_setscheduler, task_setioprio, cap_task_kill, and
task_setnice to make sure a user cannot affect a process in which
they called a program with some fscaps.
One remaining question is the note under task_setscheduler: are we
ok with CAP_SYS_NICE being sufficient to confine a process to a
cpuset?
It is a semantic change, as without fsccaps, attach_task doesn't
allow CAP_SYS_NICE to override the uid equivalence check. But since
it uses security_task_setscheduler, which elsewhere is used where
CAP_SYS_NICE can be used to override the uid equivalence check,
fixing it might be tough.
task_setscheduler
note: this also controls cpuset:attach_task. Are we ok with
CAP_SYS_NICE being used to confine to a cpuset?
task_setioprio
task_setnice
sys_setpriority uses this (through set_one_prio) for another
process. Need same checks as setrlimit
Aug 21:
Updated secureexec implementation to reflect the fact that
euid and uid might be the same and nonzero, but the process
might still have elevated caps.
Aug 15:
Handle endianness of xattrs.
Enforce capability version match between kernel and disk.
Enforce that no bits beyond the known max capability are
set, else return -EPERM.
With this extra processing, it may be worth reconsidering
doing all the work at bprm_set_security rather than
d_instantiate.
Aug 10:
Always call getxattr at bprm_set_security, rather than
caching it at d_instantiate.
[morgan@kernel.org: file-caps clean up for linux/capability.h]
[bunk@kernel.org: unexport cap_inode_killpriv]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 14:31:36 +08:00
|
|
|
*/
|
2005-04-17 06:20:36 +08:00
|
|
|
#ifndef _LINUX_CAPABILITY_H
|
|
|
|
#define _LINUX_CAPABILITY_H
|
|
|
|
|
2012-10-13 17:46:48 +08:00
|
|
|
#include <uapi/linux/capability.h>
|
2019-01-24 10:36:25 +08:00
|
|
|
#include <linux/uidgid.h>
|
2008-05-28 13:05:17 +08:00
|
|
|
|
|
|
|
#define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3
|
|
|
|
#define _KERNEL_CAPABILITY_U32S _LINUX_CAPABILITY_U32S_3
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2009-01-30 23:09:30 +08:00
|
|
|
extern int file_caps_enabled;
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
typedef struct kernel_cap_struct {
|
2008-05-28 13:05:17 +08:00
|
|
|
__u32 cap[_KERNEL_CAPABILITY_U32S];
|
2005-04-17 06:20:36 +08:00
|
|
|
} kernel_cap_t;
|
|
|
|
|
2019-01-24 10:36:25 +08:00
|
|
|
/* same as vfs_ns_cap_data but in cpu endian and always filled completely */
|
2008-11-11 18:48:10 +08:00
|
|
|
struct cpu_vfs_cap_data {
|
|
|
|
__u32 magic_etc;
|
|
|
|
kernel_cap_t permitted;
|
|
|
|
kernel_cap_t inheritable;
|
2019-01-24 10:36:25 +08:00
|
|
|
kuid_t rootid;
|
2008-11-11 18:48:10 +08:00
|
|
|
};
|
|
|
|
|
2008-02-05 14:29:42 +08:00
|
|
|
#define _USER_CAP_HEADER_SIZE (sizeof(struct __user_cap_header_struct))
|
2005-04-17 06:20:36 +08:00
|
|
|
#define _KERNEL_CAP_T_SIZE (sizeof(kernel_cap_t))
|
|
|
|
|
|
|
|
|
2013-04-15 01:06:31 +08:00
|
|
|
struct file;
|
2011-11-15 08:24:06 +08:00
|
|
|
struct inode;
|
userns: security: make capabilities relative to the user namespace
- Introduce ns_capable to test for a capability in a non-default
user namespace.
- Teach cap_capable to handle capabilities in a non-default
user namespace.
The motivation is to get to the unprivileged creation of new
namespaces. It looks like this gets us 90% of the way there, with
only potential uid confusion issues left.
I still need to handle getting all caps after creation but otherwise I
think I have a good starter patch that achieves all of your goals.
Changelog:
11/05/2010: [serge] add apparmor
12/14/2010: [serge] fix capabilities to created user namespaces
Without this, if user serge creates a user_ns, he won't have
capabilities to the user_ns he created. THis is because we
were first checking whether his effective caps had the caps
he needed and returning -EPERM if not, and THEN checking whether
he was the creator. Reverse those checks.
12/16/2010: [serge] security_real_capable needs ns argument in !security case
01/11/2011: [serge] add task_ns_capable helper
01/11/2011: [serge] add nsown_capable() helper per Bastian Blank suggestion
02/16/2011: [serge] fix a logic bug: the root user is always creator of
init_user_ns, but should not always have capabilities to
it! Fix the check in cap_capable().
02/21/2011: Add the required user_ns parameter to security_capable,
fixing a compile failure.
02/23/2011: Convert some macros to functions as per akpm comments. Some
couldn't be converted because we can't easily forward-declare
them (they are inline if !SECURITY, extern if SECURITY). Add
a current_user_ns function so we can use it in capability.h
without #including cred.h. Move all forward declarations
together to the top of the #ifdef __KERNEL__ section, and use
kernel-doc format.
02/23/2011: Per dhowells, clean up comment in cap_capable().
02/23/2011: Per akpm, remove unreachable 'return -EPERM' in cap_capable.
(Original written and signed off by Eric; latest, modified version
acked by him)
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: export current_user_ns() for ecryptfs]
[serge.hallyn@canonical.com: remove unneeded extra argument in selinux's task_has_capability]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-24 07:43:17 +08:00
|
|
|
struct dentry;
|
2016-08-03 05:03:36 +08:00
|
|
|
struct task_struct;
|
userns: security: make capabilities relative to the user namespace
- Introduce ns_capable to test for a capability in a non-default
user namespace.
- Teach cap_capable to handle capabilities in a non-default
user namespace.
The motivation is to get to the unprivileged creation of new
namespaces. It looks like this gets us 90% of the way there, with
only potential uid confusion issues left.
I still need to handle getting all caps after creation but otherwise I
think I have a good starter patch that achieves all of your goals.
Changelog:
11/05/2010: [serge] add apparmor
12/14/2010: [serge] fix capabilities to created user namespaces
Without this, if user serge creates a user_ns, he won't have
capabilities to the user_ns he created. THis is because we
were first checking whether his effective caps had the caps
he needed and returning -EPERM if not, and THEN checking whether
he was the creator. Reverse those checks.
12/16/2010: [serge] security_real_capable needs ns argument in !security case
01/11/2011: [serge] add task_ns_capable helper
01/11/2011: [serge] add nsown_capable() helper per Bastian Blank suggestion
02/16/2011: [serge] fix a logic bug: the root user is always creator of
init_user_ns, but should not always have capabilities to
it! Fix the check in cap_capable().
02/21/2011: Add the required user_ns parameter to security_capable,
fixing a compile failure.
02/23/2011: Convert some macros to functions as per akpm comments. Some
couldn't be converted because we can't easily forward-declare
them (they are inline if !SECURITY, extern if SECURITY). Add
a current_user_ns function so we can use it in capability.h
without #including cred.h. Move all forward declarations
together to the top of the #ifdef __KERNEL__ section, and use
kernel-doc format.
02/23/2011: Per dhowells, clean up comment in cap_capable().
02/23/2011: Per akpm, remove unreachable 'return -EPERM' in cap_capable.
(Original written and signed off by Eric; latest, modified version
acked by him)
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: export current_user_ns() for ecryptfs]
[serge.hallyn@canonical.com: remove unneeded extra argument in selinux's task_has_capability]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-24 07:43:17 +08:00
|
|
|
struct user_namespace;
|
|
|
|
|
|
|
|
extern const kernel_cap_t __cap_empty_set;
|
|
|
|
extern const kernel_cap_t __cap_init_eff_set;
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
/*
|
|
|
|
* Internal kernel functions only
|
|
|
|
*/
|
Implement file posix capabilities
Implement file posix capabilities. This allows programs to be given a
subset of root's powers regardless of who runs them, without having to use
setuid and giving the binary all of root's powers.
This version works with Kaigai Kohei's userspace tools, found at
http://www.kaigai.gr.jp/index.php. For more information on how to use this
patch, Chris Friedhoff has posted a nice page at
http://www.friedhoff.org/fscaps.html.
Changelog:
Nov 27:
Incorporate fixes from Andrew Morton
(security-introduce-file-caps-tweaks and
security-introduce-file-caps-warning-fix)
Fix Kconfig dependency.
Fix change signaling behavior when file caps are not compiled in.
Nov 13:
Integrate comments from Alexey: Remove CONFIG_ ifdef from
capability.h, and use %zd for printing a size_t.
Nov 13:
Fix endianness warnings by sparse as suggested by Alexey
Dobriyan.
Nov 09:
Address warnings of unused variables at cap_bprm_set_security
when file capabilities are disabled, and simultaneously clean
up the code a little, by pulling the new code into a helper
function.
Nov 08:
For pointers to required userspace tools and how to use
them, see http://www.friedhoff.org/fscaps.html.
Nov 07:
Fix the calculation of the highest bit checked in
check_cap_sanity().
Nov 07:
Allow file caps to be enabled without CONFIG_SECURITY, since
capabilities are the default.
Hook cap_task_setscheduler when !CONFIG_SECURITY.
Move capable(TASK_KILL) to end of cap_task_kill to reduce
audit messages.
Nov 05:
Add secondary calls in selinux/hooks.c to task_setioprio and
task_setscheduler so that selinux and capabilities with file
cap support can be stacked.
Sep 05:
As Seth Arnold points out, uid checks are out of place
for capability code.
Sep 01:
Define task_setscheduler, task_setioprio, cap_task_kill, and
task_setnice to make sure a user cannot affect a process in which
they called a program with some fscaps.
One remaining question is the note under task_setscheduler: are we
ok with CAP_SYS_NICE being sufficient to confine a process to a
cpuset?
It is a semantic change, as without fsccaps, attach_task doesn't
allow CAP_SYS_NICE to override the uid equivalence check. But since
it uses security_task_setscheduler, which elsewhere is used where
CAP_SYS_NICE can be used to override the uid equivalence check,
fixing it might be tough.
task_setscheduler
note: this also controls cpuset:attach_task. Are we ok with
CAP_SYS_NICE being used to confine to a cpuset?
task_setioprio
task_setnice
sys_setpriority uses this (through set_one_prio) for another
process. Need same checks as setrlimit
Aug 21:
Updated secureexec implementation to reflect the fact that
euid and uid might be the same and nonzero, but the process
might still have elevated caps.
Aug 15:
Handle endianness of xattrs.
Enforce capability version match between kernel and disk.
Enforce that no bits beyond the known max capability are
set, else return -EPERM.
With this extra processing, it may be worth reconsidering
doing all the work at bprm_set_security rather than
d_instantiate.
Aug 10:
Always call getxattr at bprm_set_security, rather than
caching it at d_instantiate.
[morgan@kernel.org: file-caps clean up for linux/capability.h]
[bunk@kernel.org: unexport cap_inode_killpriv]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 14:31:36 +08:00
|
|
|
|
2008-02-05 14:29:42 +08:00
|
|
|
#define CAP_FOR_EACH_U32(__capi) \
|
2008-05-28 13:05:17 +08:00
|
|
|
for (__capi = 0; __capi < _KERNEL_CAPABILITY_U32S; ++__capi)
|
2008-02-05 14:29:42 +08:00
|
|
|
|
2009-04-13 22:56:14 +08:00
|
|
|
/*
|
|
|
|
* CAP_FS_MASK and CAP_NFSD_MASKS:
|
|
|
|
*
|
|
|
|
* The fs mask is all the privileges that fsuid==0 historically meant.
|
|
|
|
* At one time in the past, that included CAP_MKNOD and CAP_LINUX_IMMUTABLE.
|
|
|
|
*
|
|
|
|
* It has never meant setting security.* and trusted.* xattrs.
|
|
|
|
*
|
|
|
|
* We could also define fsmask as follows:
|
|
|
|
* 1. CAP_FS_MASK is the privilege to bypass all fs-related DAC permissions
|
|
|
|
* 2. The security.* and trusted.* xattrs are fs-related MAC permissions
|
|
|
|
*/
|
|
|
|
|
2008-02-05 14:29:42 +08:00
|
|
|
# define CAP_FS_MASK_B0 (CAP_TO_MASK(CAP_CHOWN) \
|
2009-04-13 22:56:14 +08:00
|
|
|
| CAP_TO_MASK(CAP_MKNOD) \
|
2008-02-05 14:29:42 +08:00
|
|
|
| CAP_TO_MASK(CAP_DAC_OVERRIDE) \
|
|
|
|
| CAP_TO_MASK(CAP_DAC_READ_SEARCH) \
|
|
|
|
| CAP_TO_MASK(CAP_FOWNER) \
|
|
|
|
| CAP_TO_MASK(CAP_FSETID))
|
|
|
|
|
Smack: Simplified Mandatory Access Control Kernel
Smack is the Simplified Mandatory Access Control Kernel.
Smack implements mandatory access control (MAC) using labels
attached to tasks and data containers, including files, SVIPC,
and other tasks. Smack is a kernel based scheme that requires
an absolute minimum of application support and a very small
amount of configuration data.
Smack uses extended attributes and
provides a set of general mount options, borrowing technics used
elsewhere. Smack uses netlabel for CIPSO labeling. Smack provides
a pseudo-filesystem smackfs that is used for manipulation of
system Smack attributes.
The patch, patches for ls and sshd, a README, a startup script,
and x86 binaries for ls and sshd are also available on
http://www.schaufler-ca.com
Development has been done using Fedora Core 7 in a virtual machine
environment and on an old Sony laptop.
Smack provides mandatory access controls based on the label attached
to a task and the label attached to the object it is attempting to
access. Smack labels are deliberately short (1-23 characters) text
strings. Single character labels using special characters are reserved
for system use. The only operation applied to Smack labels is equality
comparison. No wildcards or expressions, regular or otherwise, are
used. Smack labels are composed of printable characters and may not
include "/".
A file always gets the Smack label of the task that created it.
Smack defines and uses these labels:
"*" - pronounced "star"
"_" - pronounced "floor"
"^" - pronounced "hat"
"?" - pronounced "huh"
The access rules enforced by Smack are, in order:
1. Any access requested by a task labeled "*" is denied.
2. A read or execute access requested by a task labeled "^"
is permitted.
3. A read or execute access requested on an object labeled "_"
is permitted.
4. Any access requested on an object labeled "*" is permitted.
5. Any access requested by a task on an object with the same
label is permitted.
6. Any access requested that is explicitly defined in the loaded
rule set is permitted.
7. Any other access is denied.
Rules may be explicitly defined by writing subject,object,access
triples to /smack/load.
Smack rule sets can be easily defined that describe Bell&LaPadula
sensitivity, Biba integrity, and a variety of interesting
configurations. Smack rule sets can be modified on the fly to
accommodate changes in the operating environment or even the time
of day.
Some practical use cases:
Hierarchical levels. The less common of the two usual uses
for MLS systems is to define hierarchical levels, often
unclassified, confidential, secret, and so on. To set up smack
to support this, these rules could be defined:
C Unclass rx
S C rx
S Unclass rx
TS S rx
TS C rx
TS Unclass rx
A TS process can read S, C, and Unclass data, but cannot write it.
An S process can read C and Unclass. Note that specifying that
TS can read S and S can read C does not imply TS can read C, it
has to be explicitly stated.
Non-hierarchical categories. This is the more common of the
usual uses for an MLS system. Since the default rule is that a
subject cannot access an object with a different label no
access rules are required to implement compartmentalization.
A case that the Bell & LaPadula policy does not allow is demonstrated
with this Smack access rule:
A case that Bell&LaPadula does not allow that Smack does:
ESPN ABC r
ABC ESPN r
On my portable video device I have two applications, one that
shows ABC programming and the other ESPN programming. ESPN wants
to show me sport stories that show up as news, and ABC will
only provide minimal information about a sports story if ESPN
is covering it. Each side can look at the other's info, neither
can change the other. Neither can see what FOX is up to, which
is just as well all things considered.
Another case that I especially like:
SatData Guard w
Guard Publish w
A program running with the Guard label opens a UDP socket and
accepts messages sent by a program running with a SatData label.
The Guard program inspects the message to ensure it is wholesome
and if it is sends it to a program running with the Publish label.
This program then puts the information passed in an appropriate
place. Note that the Guard program cannot write to a Publish
file system object because file system semanitic require read as
well as write.
The four cases (categories, levels, mutual read, guardbox) here
are all quite real, and problems I've been asked to solve over
the years. The first two are easy to do with traditonal MLS systems
while the last two you can't without invoking privilege, at least
for a while.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Cc: Joshua Brindle <method@manicmethod.com>
Cc: Paul Moore <paul.moore@hp.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Cc: "Ahmed S. Darwish" <darwish.07@gmail.com>
Cc: Andrew G. Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 14:29:50 +08:00
|
|
|
# define CAP_FS_MASK_B1 (CAP_TO_MASK(CAP_MAC_OVERRIDE))
|
|
|
|
|
2008-05-28 13:05:17 +08:00
|
|
|
#if _KERNEL_CAPABILITY_U32S != 2
|
2008-02-05 14:29:42 +08:00
|
|
|
# error Fix up hand-coded capability macro initializers
|
|
|
|
#else /* HAND-CODED capability initializers */
|
|
|
|
|
2014-07-24 03:36:26 +08:00
|
|
|
#define CAP_LAST_U32 ((_KERNEL_CAPABILITY_U32S) - 1)
|
|
|
|
#define CAP_LAST_U32_VALID_MASK (CAP_TO_MASK(CAP_LAST_CAP + 1) -1)
|
|
|
|
|
2008-04-30 03:54:28 +08:00
|
|
|
# define CAP_EMPTY_SET ((kernel_cap_t){{ 0, 0 }})
|
2014-07-24 03:36:26 +08:00
|
|
|
# define CAP_FULL_SET ((kernel_cap_t){{ ~0, CAP_LAST_U32_VALID_MASK }})
|
2009-04-13 22:56:14 +08:00
|
|
|
# define CAP_FS_SET ((kernel_cap_t){{ CAP_FS_MASK_B0 \
|
|
|
|
| CAP_TO_MASK(CAP_LINUX_IMMUTABLE), \
|
|
|
|
CAP_FS_MASK_B1 } })
|
2009-03-17 06:34:20 +08:00
|
|
|
# define CAP_NFSD_SET ((kernel_cap_t){{ CAP_FS_MASK_B0 \
|
2009-04-13 22:56:14 +08:00
|
|
|
| CAP_TO_MASK(CAP_SYS_RESOURCE), \
|
|
|
|
CAP_FS_MASK_B1 } })
|
2008-02-05 14:29:42 +08:00
|
|
|
|
2008-05-28 13:05:17 +08:00
|
|
|
#endif /* _KERNEL_CAPABILITY_U32S != 2 */
|
2008-02-05 14:29:42 +08:00
|
|
|
|
|
|
|
# define cap_clear(c) do { (c) = __cap_empty_set; } while (0)
|
|
|
|
|
|
|
|
#define cap_raise(c, flag) ((c).cap[CAP_TO_INDEX(flag)] |= CAP_TO_MASK(flag))
|
|
|
|
#define cap_lower(c, flag) ((c).cap[CAP_TO_INDEX(flag)] &= ~CAP_TO_MASK(flag))
|
|
|
|
#define cap_raised(c, flag) ((c).cap[CAP_TO_INDEX(flag)] & CAP_TO_MASK(flag))
|
|
|
|
|
|
|
|
#define CAP_BOP_ALL(c, a, b, OP) \
|
|
|
|
do { \
|
|
|
|
unsigned __capi; \
|
|
|
|
CAP_FOR_EACH_U32(__capi) { \
|
|
|
|
c.cap[__capi] = a.cap[__capi] OP b.cap[__capi]; \
|
|
|
|
} \
|
|
|
|
} while (0)
|
|
|
|
|
|
|
|
#define CAP_UOP_ALL(c, a, OP) \
|
|
|
|
do { \
|
|
|
|
unsigned __capi; \
|
|
|
|
CAP_FOR_EACH_U32(__capi) { \
|
|
|
|
c.cap[__capi] = OP a.cap[__capi]; \
|
|
|
|
} \
|
|
|
|
} while (0)
|
|
|
|
|
|
|
|
static inline kernel_cap_t cap_combine(const kernel_cap_t a,
|
|
|
|
const kernel_cap_t b)
|
|
|
|
{
|
|
|
|
kernel_cap_t dest;
|
|
|
|
CAP_BOP_ALL(dest, a, b, |);
|
|
|
|
return dest;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-02-05 14:29:42 +08:00
|
|
|
static inline kernel_cap_t cap_intersect(const kernel_cap_t a,
|
|
|
|
const kernel_cap_t b)
|
|
|
|
{
|
|
|
|
kernel_cap_t dest;
|
|
|
|
CAP_BOP_ALL(dest, a, b, &);
|
|
|
|
return dest;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-02-05 14:29:42 +08:00
|
|
|
static inline kernel_cap_t cap_drop(const kernel_cap_t a,
|
|
|
|
const kernel_cap_t drop)
|
|
|
|
{
|
|
|
|
kernel_cap_t dest;
|
|
|
|
CAP_BOP_ALL(dest, a, drop, &~);
|
|
|
|
return dest;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-02-05 14:29:42 +08:00
|
|
|
static inline kernel_cap_t cap_invert(const kernel_cap_t c)
|
|
|
|
{
|
|
|
|
kernel_cap_t dest;
|
|
|
|
CAP_UOP_ALL(dest, c, ~);
|
|
|
|
return dest;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2015-11-17 15:25:24 +08:00
|
|
|
static inline bool cap_isclear(const kernel_cap_t a)
|
2008-02-05 14:29:42 +08:00
|
|
|
{
|
|
|
|
unsigned __capi;
|
|
|
|
CAP_FOR_EACH_U32(__capi) {
|
|
|
|
if (a.cap[__capi] != 0)
|
2015-11-17 15:25:24 +08:00
|
|
|
return false;
|
2008-02-05 14:29:42 +08:00
|
|
|
}
|
2015-11-17 15:25:24 +08:00
|
|
|
return true;
|
2008-02-05 14:29:42 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-11-11 18:48:07 +08:00
|
|
|
/*
|
|
|
|
* Check if "a" is a subset of "set".
|
2015-11-17 15:25:24 +08:00
|
|
|
* return true if ALL of the capabilities in "a" are also in "set"
|
|
|
|
* cap_issubset(0101, 1111) will return true
|
|
|
|
* return false if ANY of the capabilities in "a" are not in "set"
|
|
|
|
* cap_issubset(1111, 0101) will return false
|
2008-11-11 18:48:07 +08:00
|
|
|
*/
|
2015-11-17 15:25:24 +08:00
|
|
|
static inline bool cap_issubset(const kernel_cap_t a, const kernel_cap_t set)
|
2008-02-05 14:29:42 +08:00
|
|
|
{
|
|
|
|
kernel_cap_t dest;
|
|
|
|
dest = cap_drop(a, set);
|
|
|
|
return cap_isclear(dest);
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-02-05 14:29:42 +08:00
|
|
|
/* Used to decide between falling back on the old suser() or fsuser(). */
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-02-05 14:29:42 +08:00
|
|
|
static inline kernel_cap_t cap_drop_fs_set(const kernel_cap_t a)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-02-05 14:29:42 +08:00
|
|
|
const kernel_cap_t __cap_fs_set = CAP_FS_SET;
|
|
|
|
return cap_drop(a, __cap_fs_set);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-02-05 14:29:42 +08:00
|
|
|
static inline kernel_cap_t cap_raise_fs_set(const kernel_cap_t a,
|
|
|
|
const kernel_cap_t permitted)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-02-05 14:29:42 +08:00
|
|
|
const kernel_cap_t __cap_fs_set = CAP_FS_SET;
|
|
|
|
return cap_combine(a,
|
|
|
|
cap_intersect(permitted, __cap_fs_set));
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-02-05 14:29:42 +08:00
|
|
|
static inline kernel_cap_t cap_drop_nfsd_set(const kernel_cap_t a)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-02-05 14:29:42 +08:00
|
|
|
const kernel_cap_t __cap_fs_set = CAP_NFSD_SET;
|
|
|
|
return cap_drop(a, __cap_fs_set);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-02-05 14:29:42 +08:00
|
|
|
static inline kernel_cap_t cap_raise_nfsd_set(const kernel_cap_t a,
|
|
|
|
const kernel_cap_t permitted)
|
|
|
|
{
|
|
|
|
const kernel_cap_t __cap_nfsd_set = CAP_NFSD_SET;
|
|
|
|
return cap_combine(a,
|
|
|
|
cap_intersect(permitted, __cap_nfsd_set));
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
kernel: conditionally support non-root users, groups and capabilities
There are a lot of embedded systems that run most or all of their
functionality in init, running as root:root. For these systems,
supporting multiple users is not necessary.
This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
non-root users, non-root groups, and capabilities optional. It is enabled
under CONFIG_EXPERT menu.
When this symbol is not defined, UID and GID are zero in any possible case
and processes always have all capabilities.
The following syscalls are compiled out: setuid, setregid, setgid,
setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
getgroups, setfsuid, setfsgid, capget, capset.
Also, groups.c is compiled out completely.
In kernel/capability.c, capable function was moved in order to avoid
adding two ifdef blocks.
This change saves about 25 KB on a defconfig build. The most minimal
kernels have total text sizes in the high hundreds of kB rather than
low MB. (The 25k goes down a bit with allnoconfig, but not that much.
The kernel was booted in Qemu. All the common functionalities work.
Adding users/groups is not possible, failing with -ENOSYS.
Bloat-o-meter output:
add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Iulia Manda <iulia.manda21@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-16 07:16:41 +08:00
|
|
|
#ifdef CONFIG_MULTIUSER
|
2011-03-24 07:43:21 +08:00
|
|
|
extern bool has_capability(struct task_struct *t, int cap);
|
|
|
|
extern bool has_ns_capability(struct task_struct *t,
|
|
|
|
struct user_namespace *ns, int cap);
|
|
|
|
extern bool has_capability_noaudit(struct task_struct *t, int cap);
|
2012-01-04 01:25:15 +08:00
|
|
|
extern bool has_ns_capability_noaudit(struct task_struct *t,
|
|
|
|
struct user_namespace *ns, int cap);
|
userns: security: make capabilities relative to the user namespace
- Introduce ns_capable to test for a capability in a non-default
user namespace.
- Teach cap_capable to handle capabilities in a non-default
user namespace.
The motivation is to get to the unprivileged creation of new
namespaces. It looks like this gets us 90% of the way there, with
only potential uid confusion issues left.
I still need to handle getting all caps after creation but otherwise I
think I have a good starter patch that achieves all of your goals.
Changelog:
11/05/2010: [serge] add apparmor
12/14/2010: [serge] fix capabilities to created user namespaces
Without this, if user serge creates a user_ns, he won't have
capabilities to the user_ns he created. THis is because we
were first checking whether his effective caps had the caps
he needed and returning -EPERM if not, and THEN checking whether
he was the creator. Reverse those checks.
12/16/2010: [serge] security_real_capable needs ns argument in !security case
01/11/2011: [serge] add task_ns_capable helper
01/11/2011: [serge] add nsown_capable() helper per Bastian Blank suggestion
02/16/2011: [serge] fix a logic bug: the root user is always creator of
init_user_ns, but should not always have capabilities to
it! Fix the check in cap_capable().
02/21/2011: Add the required user_ns parameter to security_capable,
fixing a compile failure.
02/23/2011: Convert some macros to functions as per akpm comments. Some
couldn't be converted because we can't easily forward-declare
them (they are inline if !SECURITY, extern if SECURITY). Add
a current_user_ns function so we can use it in capability.h
without #including cred.h. Move all forward declarations
together to the top of the #ifdef __KERNEL__ section, and use
kernel-doc format.
02/23/2011: Per dhowells, clean up comment in cap_capable().
02/23/2011: Per akpm, remove unreachable 'return -EPERM' in cap_capable.
(Original written and signed off by Eric; latest, modified version
acked by him)
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: export current_user_ns() for ecryptfs]
[serge.hallyn@canonical.com: remove unneeded extra argument in selinux's task_has_capability]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-24 07:43:17 +08:00
|
|
|
extern bool capable(int cap);
|
|
|
|
extern bool ns_capable(struct user_namespace *ns, int cap);
|
2016-06-03 12:43:21 +08:00
|
|
|
extern bool ns_capable_noaudit(struct user_namespace *ns, int cap);
|
2019-01-23 06:42:09 +08:00
|
|
|
extern bool ns_capable_setid(struct user_namespace *ns, int cap);
|
kernel: conditionally support non-root users, groups and capabilities
There are a lot of embedded systems that run most or all of their
functionality in init, running as root:root. For these systems,
supporting multiple users is not necessary.
This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
non-root users, non-root groups, and capabilities optional. It is enabled
under CONFIG_EXPERT menu.
When this symbol is not defined, UID and GID are zero in any possible case
and processes always have all capabilities.
The following syscalls are compiled out: setuid, setregid, setgid,
setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
getgroups, setfsuid, setfsgid, capget, capset.
Also, groups.c is compiled out completely.
In kernel/capability.c, capable function was moved in order to avoid
adding two ifdef blocks.
This change saves about 25 KB on a defconfig build. The most minimal
kernels have total text sizes in the high hundreds of kB rather than
low MB. (The 25k goes down a bit with allnoconfig, but not that much.
The kernel was booted in Qemu. All the common functionalities work.
Adding users/groups is not possible, failing with -ENOSYS.
Bloat-o-meter output:
add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Iulia Manda <iulia.manda21@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-16 07:16:41 +08:00
|
|
|
#else
|
|
|
|
static inline bool has_capability(struct task_struct *t, int cap)
|
|
|
|
{
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
static inline bool has_ns_capability(struct task_struct *t,
|
|
|
|
struct user_namespace *ns, int cap)
|
|
|
|
{
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
static inline bool has_capability_noaudit(struct task_struct *t, int cap)
|
|
|
|
{
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
static inline bool has_ns_capability_noaudit(struct task_struct *t,
|
|
|
|
struct user_namespace *ns, int cap)
|
|
|
|
{
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
static inline bool capable(int cap)
|
|
|
|
{
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
static inline bool ns_capable(struct user_namespace *ns, int cap)
|
|
|
|
{
|
|
|
|
return true;
|
|
|
|
}
|
2016-06-03 12:43:21 +08:00
|
|
|
static inline bool ns_capable_noaudit(struct user_namespace *ns, int cap)
|
|
|
|
{
|
|
|
|
return true;
|
|
|
|
}
|
2019-01-23 06:42:09 +08:00
|
|
|
static inline bool ns_capable_setid(struct user_namespace *ns, int cap)
|
|
|
|
{
|
|
|
|
return true;
|
|
|
|
}
|
kernel: conditionally support non-root users, groups and capabilities
There are a lot of embedded systems that run most or all of their
functionality in init, running as root:root. For these systems,
supporting multiple users is not necessary.
This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
non-root users, non-root groups, and capabilities optional. It is enabled
under CONFIG_EXPERT menu.
When this symbol is not defined, UID and GID are zero in any possible case
and processes always have all capabilities.
The following syscalls are compiled out: setuid, setregid, setgid,
setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
getgroups, setfsuid, setfsgid, capget, capset.
Also, groups.c is compiled out completely.
In kernel/capability.c, capable function was moved in order to avoid
adding two ifdef blocks.
This change saves about 25 KB on a defconfig build. The most minimal
kernels have total text sizes in the high hundreds of kB rather than
low MB. (The 25k goes down a bit with allnoconfig, but not that much.
The kernel was booted in Qemu. All the common functionalities work.
Adding users/groups is not possible, failing with -ENOSYS.
Bloat-o-meter output:
add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Iulia Manda <iulia.manda21@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-16 07:16:41 +08:00
|
|
|
#endif /* CONFIG_MULTIUSER */
|
2021-01-21 21:19:23 +08:00
|
|
|
bool privileged_wrt_inode_uidgid(struct user_namespace *ns,
|
|
|
|
struct user_namespace *mnt_userns,
|
|
|
|
const struct inode *inode);
|
|
|
|
bool capable_wrt_inode_uidgid(struct user_namespace *mnt_userns,
|
|
|
|
const struct inode *inode, int cap);
|
2013-04-15 01:06:31 +08:00
|
|
|
extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
|
2016-11-15 08:48:07 +08:00
|
|
|
extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
|
capabilities: Introduce CAP_PERFMON to kernel and user space
Introduce the CAP_PERFMON capability designed to secure system
performance monitoring and observability operations so that CAP_PERFMON
can assist CAP_SYS_ADMIN capability in its governing role for
performance monitoring and observability subsystems.
CAP_PERFMON hardens system security and integrity during performance
monitoring and observability operations by decreasing attack surface that
is available to a CAP_SYS_ADMIN privileged process [2]. Providing the access
to system performance monitoring and observability operations under CAP_PERFMON
capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes
chances to misuse the credentials and makes the operation more secure.
Thus, CAP_PERFMON implements the principle of least privilege for
performance monitoring and observability operations (POSIX IEEE 1003.1e:
2.2.2.39 principle of least privilege: A security design principle that
states that a process or program be granted only those privileges
(e.g., capabilities) necessary to accomplish its legitimate function,
and only for the time that such privileges are actually required)
CAP_PERFMON meets the demand to secure system performance monitoring and
observability operations for adoption in security sensitive, restricted,
multiuser production environments (e.g. HPC clusters, cloud and virtual compute
environments), where root or CAP_SYS_ADMIN credentials are not available to
mass users of a system, and securely unblocks applicability and scalability
of system performance monitoring and observability operations beyond root
and CAP_SYS_ADMIN use cases.
CAP_PERFMON takes over CAP_SYS_ADMIN credentials related to system performance
monitoring and observability operations and balances amount of CAP_SYS_ADMIN
credentials following the recommendations in the capabilities man page [1]
for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to kernel
developers, below." For backward compatibility reasons access to system
performance monitoring and observability subsystems of the kernel remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN capability
usage for secure system performance monitoring and observability operations
is discouraged with respect to the designed CAP_PERFMON capability.
Although the software running under CAP_PERFMON can not ensure avoidance
of related hardware issues, the software can still mitigate these issues
following the official hardware issues mitigation procedure [2]. The bugs
in the software itself can be fixed following the standard kernel development
process [3] to maintain and harden security of system performance monitoring
and observability operations.
[1] http://man7.org/linux/man-pages/man7/capabilities.7.html
[2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
[3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Igor Lubashev <ilubashe@akamai.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-doc@vger.kernel.org
Cc: linux-man@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: selinux@vger.kernel.org
Link: http://lore.kernel.org/lkml/5590d543-82c6-490a-6544-08e6a5517db0@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-02 16:45:31 +08:00
|
|
|
static inline bool perfmon_capable(void)
|
|
|
|
{
|
|
|
|
return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
|
|
|
|
}
|
2006-01-12 04:17:46 +08:00
|
|
|
|
bpf, capability: Introduce CAP_BPF
Split BPF operations that are allowed under CAP_SYS_ADMIN into
combination of CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN.
For backward compatibility include them in CAP_SYS_ADMIN as well.
The end result provides simple safety model for applications that use BPF:
- to load tracing program types
BPF_PROG_TYPE_{KPROBE, TRACEPOINT, PERF_EVENT, RAW_TRACEPOINT, etc}
use CAP_BPF and CAP_PERFMON
- to load networking program types
BPF_PROG_TYPE_{SCHED_CLS, XDP, SK_SKB, etc}
use CAP_BPF and CAP_NET_ADMIN
There are few exceptions from this rule:
- bpf_trace_printk() is allowed in networking programs, but it's using
tracing mechanism, hence this helper needs additional CAP_PERFMON
if networking program is using this helper.
- BPF_F_ZERO_SEED flag for hash/lru map is allowed under CAP_SYS_ADMIN only
to discourage production use.
- BPF HW offload is allowed under CAP_SYS_ADMIN.
- bpf_probe_write_user() is allowed under CAP_SYS_ADMIN only.
CAPs are not checked at attach/detach time with two exceptions:
- loading BPF_PROG_TYPE_CGROUP_SKB is allowed for unprivileged users,
hence CAP_NET_ADMIN is required at attach time.
- flow_dissector detach doesn't check prog FD at detach,
hence CAP_NET_ADMIN is required at detach time.
CAP_SYS_ADMIN is required to iterate BPF objects (progs, maps, links) via get_next_id
command and convert them to file descriptor via GET_FD_BY_ID command.
This restriction guarantees that mutliple tasks with CAP_BPF are not able to
affect each other. That leads to clean isolation of tasks. For example:
task A with CAP_BPF and CAP_NET_ADMIN loads and attaches a firewall via bpf_link.
task B with the same capabilities cannot detach that firewall unless
task A explicitly passed link FD to task B via scm_rights or bpffs.
CAP_SYS_ADMIN can still detach/unload everything.
Two networking user apps with CAP_SYS_ADMIN and CAP_NET_ADMIN can
accidentely mess with each other programs and maps.
Two networking user apps with CAP_NET_ADMIN and CAP_BPF cannot affect each other.
CAP_NET_ADMIN + CAP_BPF allows networking programs access only packet data.
Such networking progs cannot access arbitrary kernel memory or leak pointers.
bpftool, bpftrace, bcc tools binaries should NOT be installed with
CAP_BPF and CAP_PERFMON, since unpriv users will be able to read kernel secrets.
But users with these two permissions will be able to use these tracing tools.
CAP_PERFMON is least secure, since it allows kprobes and kernel memory access.
CAP_NET_ADMIN can stop network traffic via iproute2.
CAP_BPF is the safest from security point of view and harmless on its own.
Having CAP_BPF and/or CAP_NET_ADMIN is not enough to write into arbitrary map
and if that map is used by firewall-like bpf prog.
CAP_BPF allows many bpf prog_load commands in parallel. The verifier
may consume large amount of memory and significantly slow down the system.
Existing unprivileged BPF operations are not affected.
In particular unprivileged users are allowed to load socket_filter and cg_skb
program types and to create array, hash, prog_array, map-in-map map types.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200513230355.7858-2-alexei.starovoitov@gmail.com
2020-05-14 07:03:53 +08:00
|
|
|
static inline bool bpf_capable(void)
|
|
|
|
{
|
|
|
|
return capable(CAP_BPF) || capable(CAP_SYS_ADMIN);
|
|
|
|
}
|
|
|
|
|
capabilities: Introduce CAP_CHECKPOINT_RESTORE
This patch introduces CAP_CHECKPOINT_RESTORE, a new capability facilitating
checkpoint/restore for non-root users.
Over the last years, The CRIU (Checkpoint/Restore In Userspace) team has
been asked numerous times if it is possible to checkpoint/restore a
process as non-root. The answer usually was: 'almost'.
The main blocker to restore a process as non-root was to control the PID
of the restored process. This feature available via the clone3 system
call, or via /proc/sys/kernel/ns_last_pid is unfortunately guarded by
CAP_SYS_ADMIN.
In the past two years, requests for non-root checkpoint/restore have
increased due to the following use cases:
* Checkpoint/Restore in an HPC environment in combination with a
resource manager distributing jobs where users are always running as
non-root. There is a desire to provide a way to checkpoint and
restore long running jobs.
* Container migration as non-root
* We have been in contact with JVM developers who are integrating
CRIU into a Java VM to decrease the startup time. These
checkpoint/restore applications are not meant to be running with
CAP_SYS_ADMIN.
We have seen the following workarounds:
* Use a setuid wrapper around CRIU:
See https://github.com/FredHutch/slurm-examples/blob/master/checkpointer/lib/checkpointer/checkpointer-suid.c
* Use a setuid helper that writes to ns_last_pid.
Unfortunately, this helper delegation technique is impossible to use
with clone3, and is thus prone to races.
See https://github.com/twosigma/set_ns_last_pid
* Cycle through PIDs with fork() until the desired PID is reached:
This has been demonstrated to work with cycling rates of 100,000 PIDs/s
See https://github.com/twosigma/set_ns_last_pid
* Patch out the CAP_SYS_ADMIN check from the kernel
* Run the desired application in a new user and PID namespace to provide
a local CAP_SYS_ADMIN for controlling PIDs. This technique has limited
use in typical container environments (e.g., Kubernetes) as /proc is
typically protected with read-only layers (e.g., /proc/sys) for
hardening purposes. Read-only layers prevent additional /proc mounts
(due to proc's SB_I_USERNS_VISIBLE property), making the use of new
PID namespaces limited as certain applications need access to /proc
matching their PID namespace.
The introduced capability allows to:
* Control PIDs when the current user is CAP_CHECKPOINT_RESTORE capable
for the corresponding PID namespace via ns_last_pid/clone3.
* Open files in /proc/pid/map_files when the current user is
CAP_CHECKPOINT_RESTORE capable in the root namespace, useful for
recovering files that are unreachable via the file system such as
deleted files, or memfd files.
See corresponding selftest for an example with clone3().
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200719100418.2112740-2-areber@redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-07-19 18:04:11 +08:00
|
|
|
static inline bool checkpoint_restore_ns_capable(struct user_namespace *ns)
|
|
|
|
{
|
|
|
|
return ns_capable(ns, CAP_CHECKPOINT_RESTORE) ||
|
|
|
|
ns_capable(ns, CAP_SYS_ADMIN);
|
|
|
|
}
|
|
|
|
|
2008-11-11 18:48:14 +08:00
|
|
|
/* audit system wants to get cap info from files as well */
|
2021-01-21 21:19:29 +08:00
|
|
|
int get_vfs_caps_from_disk(struct user_namespace *mnt_userns,
|
|
|
|
const struct dentry *dentry,
|
|
|
|
struct cpu_vfs_cap_data *cpu_caps);
|
2008-11-11 18:48:14 +08:00
|
|
|
|
2021-01-21 21:19:27 +08:00
|
|
|
int cap_convert_nscap(struct user_namespace *mnt_userns, struct dentry *dentry,
|
|
|
|
const void **ivalue, size_t size);
|
Introduce v3 namespaced file capabilities
Root in a non-initial user ns cannot be trusted to write a traditional
security.capability xattr. If it were allowed to do so, then any
unprivileged user on the host could map his own uid to root in a private
namespace, write the xattr, and execute the file with privilege on the
host.
However supporting file capabilities in a user namespace is very
desirable. Not doing so means that any programs designed to run with
limited privilege must continue to support other methods of gaining and
dropping privilege. For instance a program installer must detect
whether file capabilities can be assigned, and assign them if so but set
setuid-root otherwise. The program in turn must know how to drop
partial capabilities, and do so only if setuid-root.
This patch introduces v3 of the security.capability xattr. It builds a
vfs_ns_cap_data struct by appending a uid_t rootid to struct
vfs_cap_data. This is the absolute uid_t (that is, the uid_t in user
namespace which mounted the filesystem, usually init_user_ns) of the
root id in whose namespaces the file capabilities may take effect.
When a task asks to write a v2 security.capability xattr, if it is
privileged with respect to the userns which mounted the filesystem, then
nothing should change. Otherwise, the kernel will transparently rewrite
the xattr as a v3 with the appropriate rootid. This is done during the
execution of setxattr() to catch user-space-initiated capability writes.
Subsequently, any task executing the file which has the noted kuid as
its root uid, or which is in a descendent user_ns of such a user_ns,
will run the file with capabilities.
Similarly when asking to read file capabilities, a v3 capability will
be presented as v2 if it applies to the caller's namespace.
If a task writes a v3 security.capability, then it can provide a uid for
the xattr so long as the uid is valid in its own user namespace, and it
is privileged with CAP_SETFCAP over its namespace. The kernel will
translate that rootid to an absolute uid, and write that to disk. After
this, a task in the writer's namespace will not be able to use those
capabilities (unless rootid was 0), but a task in a namespace where the
given uid is root will.
Only a single security.capability xattr may exist at a time for a given
file. A task may overwrite an existing xattr so long as it is
privileged over the inode. Note this is a departure from previous
semantics, which required privilege to remove a security.capability
xattr. This check can be re-added if deemed useful.
This allows a simple setxattr to work, allows tar/untar to work, and
allows us to tar in one namespace and untar in another while preserving
the capability, without risking leaking privilege into a parent
namespace.
Example using tar:
$ cp /bin/sleep sleepx
$ mkdir b1 b2
$ lxc-usernsexec -m b:0:100000:1 -m b:1:$(id -u):1 -- chown 0:0 b1
$ lxc-usernsexec -m b:0:100001:1 -m b:1:$(id -u):1 -- chown 0:0 b2
$ lxc-usernsexec -m b:0:100000:1000 -- tar --xattrs-include=security.capability --xattrs -cf b1/sleepx.tar sleepx
$ lxc-usernsexec -m b:0:100001:1000 -- tar --xattrs-include=security.capability --xattrs -C b2 -xf b1/sleepx.tar
$ lxc-usernsexec -m b:0:100001:1000 -- getcap b2/sleepx
b2/sleepx = cap_sys_admin+ep
# /opt/ltp/testcases/bin/getv3xattr b2/sleepx
v3 xattr, rootid is 100001
A patch to linux-test-project adding a new set of tests for this
functionality is in the nsfscaps branch at github.com/hallyn/ltp
Changelog:
Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
Nov 07 2016: convert rootid from and to fs user_ns
(From ebiederm: mar 28 2017)
commoncap.c: fix typos - s/v4/v3
get_vfs_caps_from_disk: clarify the fs_ns root access check
nsfscaps: change the code split for cap_inode_setxattr()
Apr 09 2017:
don't return v3 cap for caps owned by current root.
return a v2 cap for a true v2 cap in non-init ns
Apr 18 2017:
. Change the flow of fscap writing to support s_user_ns writing.
. Remove refuse_fcap_overwrite(). The value of the previous
xattr doesn't matter.
Apr 24 2017:
. incorporate Eric's incremental diff
. move cap_convert_nscap to setxattr and simplify its usage
May 8, 2017:
. fix leaking dentry refcount in cap_inode_getsecurity
Signed-off-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-05-09 02:11:56 +08:00
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
#endif /* !_LINUX_CAPABILITY_H */
|