In current mainline, the degree of access to perf_event_open(2) system
call depends on the perf_event_paranoid sysctl. This has a number of
limitations:
1. The sysctl is only a single value. Many types of accesses are controlled
based on the single value thus making the control very limited and
coarse grained.
2. The sysctl is global, so if the sysctl is changed, then that means
all processes get access to perf_event_open(2) opening the door to
security issues.
This patch adds LSM and SELinux access checking which will be used in
Android to access perf_event_open(2) for the purposes of attaching BPF
programs to tracepoints, perf profiling and other operations from
userspace. These operations are intended for production systems.
5 new LSM hooks are added:
1. perf_event_open: This controls access during the perf_event_open(2)
syscall itself. The hook is called from all the places that the
perf_event_paranoid sysctl is checked to keep it consistent with the
systctl. The hook gets passed a 'type' argument which controls CPU,
kernel and tracepoint accesses (in this context, CPU, kernel and
tracepoint have the same semantics as the perf_event_paranoid sysctl).
Additionally, I added an 'open' type which is similar to
perf_event_paranoid sysctl == 3 patch carried in Android and several other
distros but was rejected in mainline [1] in 2016.
2. perf_event_alloc: This allocates a new security object for the event
which stores the current SID within the event. It will be useful when
the perf event's FD is passed through IPC to another process which may
try to read the FD. Appropriate security checks will limit access.
3. perf_event_free: Called when the event is closed.
4. perf_event_read: Called from the read(2) and mmap(2) syscalls for the event.
5. perf_event_write: Called from the ioctl(2) syscalls for the event.
[1] https://lwn.net/Articles/696240/
Since Peter had suggest LSM hooks in 2016 [1], I am adding his
Suggested-by tag below.
To use this patch, we set the perf_event_paranoid sysctl to -1 and then
apply selinux checking as appropriate (default deny everything, and then
add policy rules to give access to domains that need it). In the future
we can remove the perf_event_paranoid sysctl altogether.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: James Morris <jmorris@namei.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: rostedt@goodmis.org
Cc: Yonghong Song <yhs@fb.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: jeffv@google.com
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: primiano@google.com
Cc: Song Liu <songliubraving@fb.com>
Cc: rsavitski@google.com
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Matthew Garrett <matthewgarrett@google.com>
Link: https://lkml.kernel.org/r/20191014170308.70668-1-joel@joelfernandes.org
Pull kernel lockdown mode from James Morris:
"This is the latest iteration of the kernel lockdown patchset, from
Matthew Garrett, David Howells and others.
From the original description:
This patchset introduces an optional kernel lockdown feature,
intended to strengthen the boundary between UID 0 and the kernel.
When enabled, various pieces of kernel functionality are restricted.
Applications that rely on low-level access to either hardware or the
kernel may cease working as a result - therefore this should not be
enabled without appropriate evaluation beforehand.
The majority of mainstream distributions have been carrying variants
of this patchset for many years now, so there's value in providing a
doesn't meet every distribution requirement, but gets us much closer
to not requiring external patches.
There are two major changes since this was last proposed for mainline:
- Separating lockdown from EFI secure boot. Background discussion is
covered here: https://lwn.net/Articles/751061/
- Implementation as an LSM, with a default stackable lockdown LSM
module. This allows the lockdown feature to be policy-driven,
rather than encoding an implicit policy within the mechanism.
The new locked_down LSM hook is provided to allow LSMs to make a
policy decision around whether kernel functionality that would allow
tampering with or examining the runtime state of the kernel should be
permitted.
The included lockdown LSM provides an implementation with a simple
policy intended for general purpose use. This policy provides a coarse
level of granularity, controllable via the kernel command line:
lockdown={integrity|confidentiality}
Enable the kernel lockdown feature. If set to integrity, kernel features
that allow userland to modify the running kernel are disabled. If set to
confidentiality, kernel features that allow userland to extract
confidential information from the kernel are also disabled.
This may also be controlled via /sys/kernel/security/lockdown and
overriden by kernel configuration.
New or existing LSMs may implement finer-grained controls of the
lockdown features. Refer to the lockdown_reason documentation in
include/linux/security.h for details.
The lockdown feature has had signficant design feedback and review
across many subsystems. This code has been in linux-next for some
weeks, with a few fixes applied along the way.
Stephen Rothwell noted that commit 9d1f8be5cf ("bpf: Restrict bpf
when kernel lockdown is in confidentiality mode") is missing a
Signed-off-by from its author. Matthew responded that he is providing
this under category (c) of the DCO"
* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
kexec: Fix file verification on S390
security: constify some arrays in lockdown LSM
lockdown: Print current->comm in restriction messages
efi: Restrict efivar_ssdt_load when the kernel is locked down
tracefs: Restrict tracefs when the kernel is locked down
debugfs: Restrict debugfs when the kernel is locked down
kexec: Allow kexec_file() with appropriate IMA policy when locked down
lockdown: Lock down perf when in confidentiality mode
bpf: Restrict bpf when kernel lockdown is in confidentiality mode
lockdown: Lock down tracing and perf kprobes when in confidentiality mode
lockdown: Lock down /proc/kcore
x86/mmiotrace: Lock down the testmmiotrace module
lockdown: Lock down module params that specify hardware parameters (eg. ioport)
lockdown: Lock down TIOCSSERIAL
lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
acpi: Disable ACPI table override if the kernel is locked down
acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
ACPI: Limit access to custom_method when the kernel is locked down
x86/msr: Restrict MSR access when the kernel is locked down
x86: Lock down IO port access when the kernel is locked down
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl2BLvcUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXP9pA/+Ls9sRGZoEipycbgRnwkL9/6yFtn4
UCFGMP0eobrjL82i8uMOa/72Budsp3ZaZRxf36NpbMDPyB9ohp5jf7o1WFTELESv
EwxVvOMNwrxO2UbzRv3iywnhdPVJ4gHPa4GWfBHu2EEfhz3/Bv0tPIBdeXAbq4aC
R0p+M9X0FFEp9eP4ftwOvFGpbZ8zKo1kwgdvCnqLhHDkyqtapqO/ByCTe1VATERP
fyxjYDZNnITmI0plaIxCeeudklOTtVSAL4JPh1rk8rZIkUznZ4EBDHxdKiaz3j9C
ZtAthiAA9PfAwf4DZSPHnGsfINxeNBKLD65jZn/PUne/gNJEx4DK041X9HXBNwjv
OoArw58LCzxtTNZ//WB4CovRpeSdKvmKv0oh61k8cdQahLeHhzXE1wLQbnnBJLI3
CTsumIp4ZPEOX5r4ogdS3UIQpo3KrZump7VO85yUTRni150JpZR3egYpmcJ0So1A
QTPemBhC2CHJVTpycYZ9fVTlPeC4oNwosPmvpB8XeGu3w5JpuNSId+BDR/ZlQAmq
xWiIocGL3UMuPuJUrTGChifqBAgzK+gLa7S7RYPEnTCkj6LVQwsuP4gBXf75QTG4
FPwVcoMSDFxUDF0oFqwz4GfJlCxBSzX+BkWUn6jIiXKXBnQjU+1gu6KTwE25mf/j
snJznFk25hFYFaM=
=n4ht
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20190917' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Add LSM hooks, and SELinux access control hooks, for dnotify,
fanotify, and inotify watches. This has been discussed with both the
LSM and fs/notify folks and everybody is good with these new hooks.
- The LSM stacking changes missed a few calls to current_security() in
the SELinux code; we fix those and remove current_security() for
good.
- Improve our network object labeling cache so that we always return
the object's label, even when under memory pressure. Previously we
would return an error if we couldn't allocate a new cache entry, now
we always return the label even if we can't create a new cache entry
for it.
- Convert the sidtab atomic_t counter to a normal u32 with
READ/WRITE_ONCE() and memory barrier protection.
- A few patches to policydb.c to clean things up (remove forward
declarations, long lines, bad variable names, etc)
* tag 'selinux-pr-20190917' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
lsm: remove current_security()
selinux: fix residual uses of current_security() for the SELinux blob
selinux: avoid atomic_t usage in sidtab
fanotify, inotify, dnotify, security: add security hook for fs notifications
selinux: always return a secid from the network caches if we find one
selinux: policydb - rename type_val_to_struct_array
selinux: policydb - fix some checkpatch.pl warnings
selinux: shuffle around policydb.c to get rid of forward declarations
Add a mechanism to allow LSMs to make a policy decision around whether
kernel functionality that would allow tampering with or examining the
runtime state of the kernel should be permitted.
Signed-off-by: Matthew Garrett <mjg59@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <jmorris@namei.org>
The lockdown module is intended to allow for kernels to be locked down
early in boot - sufficiently early that we don't have the ability to
kmalloc() yet. Add support for early initialisation of some LSMs, and
then add them to the list of names when we do full initialisation later.
Early LSMs are initialised in link order and cannot be overridden via
boot parameters, and cannot make use of kmalloc() (since the allocator
isn't initialised yet).
(Fixed by Stephen Rothwell to include a stub to fix builds when
!CONFIG_SECURITY)
Signed-off-by: Matthew Garrett <mjg59@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: James Morris <jmorris@namei.org>
As of now, setting watches on filesystem objects has, at most, applied a
check for read access to the inode, and in the case of fanotify, requires
CAP_SYS_ADMIN. No specific security hook or permission check has been
provided to control the setting of watches. Using any of inotify, dnotify,
or fanotify, it is possible to observe, not only write-like operations, but
even read access to a file. Modeling the watch as being merely a read from
the file is insufficient for the needs of SELinux. This is due to the fact
that read access should not necessarily imply access to information about
when another process reads from a file. Furthermore, fanotify watches grant
more power to an application in the form of permission events. While
notification events are solely, unidirectional (i.e. they only pass
information to the receiving application), permission events are blocking.
Permission events make a request to the receiving application which will
then reply with a decision as to whether or not that action may be
completed. This causes the issue of the watching application having the
ability to exercise control over the triggering process. Without drawing a
distinction within the permission check, the ability to read would imply
the greater ability to control an application. Additionally, mount and
superblock watches apply to all files within the same mount or superblock.
Read access to one file should not necessarily imply the ability to watch
all files accessed within a given mount or superblock.
In order to solve these issues, a new LSM hook is implemented and has been
placed within the system calls for marking filesystem objects with inotify,
fanotify, and dnotify watches. These calls to the hook are placed at the
point at which the target path has been resolved and are provided with the
path struct, the mask of requested notification events, and the type of
object on which the mark is being set (inode, superblock, or mount). The
mask and obj_type have already been translated into common FS_* values
shared by the entirety of the fs notification infrastructure. The path
struct is passed rather than just the inode so that the mount is available,
particularly for mount watches. This also allows for use of the hook by
pathname-based security modules. However, since the hook is intended for
use even by inode based security modules, it is not placed under the
CONFIG_SECURITY_PATH conditional. Otherwise, the inode-based security
modules would need to enable all of the path hooks, even though they do not
use any of them.
This only provides a hook at the point of setting a watch, and presumes
that permission to set a particular watch implies the ability to receive
all notification about that object which match the mask. This is all that
is required for SELinux. If other security modules require additional hooks
or infrastructure to control delivery of notification, these can be added
by them. It does not make sense for us to propose hooks for which we have
no implementation. The understanding that all notifications received by the
requesting application are all strictly of a type for which the application
has been granted permission shows that this implementation is sufficient in
its coverage.
Security modules wishing to provide complete control over fanotify must
also implement a security_file_open hook that validates that the access
requested by the watching application is authorized. Fanotify has the issue
that it returns a file descriptor with the file mode specified during
fanotify_init() to the watching process on event. This is already covered
by the LSM security_file_open hook if the security module implements
checking of the requested file mode there. Otherwise, a watching process
can obtain escalated access to a file for which it has not been authorized.
The selinux_path_notify hook implementation works by adding five new file
permissions: watch, watch_mount, watch_sb, watch_reads, and watch_with_perm
(descriptions about which will follow), and one new filesystem permission:
watch (which is applied to superblock checks). The hook then decides which
subset of these permissions must be held by the requesting application
based on the contents of the provided mask and the obj_type. The
selinux_file_open hook already checks the requested file mode and therefore
ensures that a watching process cannot escalate its access through
fanotify.
The watch, watch_mount, and watch_sb permissions are the baseline
permissions for setting a watch on an object and each are a requirement for
any watch to be set on a file, mount, or superblock respectively. It should
be noted that having either of the other two permissions (watch_reads and
watch_with_perm) does not imply the watch, watch_mount, or watch_sb
permission. Superblock watches further require the filesystem watch
permission to the superblock. As there is no labeled object in view for
mounts, there is no specific check for mount watches beyond watch_mount to
the inode. Such a check could be added in the future, if a suitable labeled
object existed representing the mount.
The watch_reads permission is required to receive notifications from
read-exclusive events on filesystem objects. These events include accessing
a file for the purpose of reading and closing a file which has been opened
read-only. This distinction has been drawn in order to provide a direct
indication in the policy for this otherwise not obvious capability. Read
access to a file should not necessarily imply the ability to observe read
events on a file.
Finally, watch_with_perm only applies to fanotify masks since it is the
only way to set a mask which allows for the blocking, permission event.
This permission is needed for any watch which is of this type. Though
fanotify requires CAP_SYS_ADMIN, this is insufficient as it gives implicit
trust to root, which we do not do, and does not support least privilege.
Signed-off-by: Aaron Goidel <acgoide@tycho.nsa.gov>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Pull integrity updates from Mimi Zohar:
"Bug fixes, code clean up, and new features:
- IMA policy rules can be defined in terms of LSM labels, making the
IMA policy dependent on LSM policy label changes, in particular LSM
label deletions. The new environment, in which IMA-appraisal is
being used, frequently updates the LSM policy and permits LSM label
deletions.
- Prevent an mmap'ed shared file opened for write from also being
mmap'ed execute. In the long term, making this and other similar
changes at the VFS layer would be preferable.
- The IMA per policy rule template format support is needed for a
couple of new/proposed features (eg. kexec boot command line
measurement, appended signatures, and VFS provided file hashes).
- Other than the "boot-aggregate" record in the IMA measuremeent
list, all other measurements are of file data. Measuring and
storing the kexec boot command line in the IMA measurement list is
the first buffer based measurement included in the measurement
list"
* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
integrity: Introduce struct evm_xattr
ima: Update MAX_TEMPLATE_NAME_LEN to fit largest reasonable definition
KEXEC: Call ima_kexec_cmdline to measure the boot command line args
IMA: Define a new template field buf
IMA: Define a new hook to measure the kexec boot command line arguments
IMA: support for per policy rule template formats
integrity: Fix __integrity_init_keyring() section mismatch
ima: Use designated initializers for struct ima_event_data
ima: use the lsm policy update notifier
LSM: switch to blocking policy update notifiers
x86/ima: fix the Kconfig dependency for IMA_ARCH_POLICY
ima: Make arch_policy_entry static
ima: prevent a file already mmap'ed write to be mmap'ed execute
x86/ima: check EFI SetupMode too
Atomic policy updaters are not very useful as they cannot
usually perform the policy updates on their own. Since it
seems that there is no strict need for the atomicity,
switch to the blocking variant. While doing so, rename
the functions accordingly.
Signed-off-by: Janne Karhunen <janne.karhunen@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull mount ABI updates from Al Viro:
"The syscalls themselves, finally.
That's not all there is to that stuff, but switching individual
filesystems to new methods is fortunately independent from everything
else, so e.g. NFS series can go through NFS tree, etc.
As those conversions get done, we'll be finally able to get rid of a
bunch of duplication in fs/super.c introduced in the beginning of the
entire thing. I expect that to be finished in the next window..."
* 'work.mount-syscalls' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: Add a sample program for the new mount API
vfs: syscall: Add fspick() to select a superblock for reconfiguration
vfs: syscall: Add fsmount() to create a mount for a superblock
vfs: syscall: Add fsconfig() for configuring and managing a context
vfs: Implement logging through fs_context
vfs: syscall: Add fsopen() to prepare for superblock creation
Make anon_inodes unconditional
teach move_mount(2) to work with OPEN_TREE_CLONE
vfs: syscall: Add move_mount(2) to move mounts around
vfs: syscall: Add open_tree(2) to reference or clone a mount
This patch introduces a new security hook that is intended for
initializing the security data for newly created kernfs nodes, which
provide a way of storing a non-default security context, but need to
operate independently from mounts (and therefore may not have an
associated inode at the moment of creation).
The main motivation is to allow kernfs nodes to inherit the context of
the parent under SELinux, similar to the behavior of
security_inode_init_security(). Other LSMs may implement their own logic
for handling the creation of new nodes.
This patch also adds helper functions to <linux/kernfs.h> for
getting/setting security xattrs of a kernfs node so that LSMs hooks are
able to do their job. Other important attributes should be accessible
direcly in the kernfs_node fields (in case there is need for more, then
new helpers should be added to kernfs.h along with the patch that needs
them).
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: more manual merge fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add a move_mount() system call that will move a mount from one place to
another and, in the next commit, allow to attach an unattached mount tree.
The new system call looks like the following:
int move_mount(int from_dfd, const char *from_path,
int to_dfd, const char *to_path,
unsigned int flags);
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-api@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull vfs mount infrastructure updates from Al Viro:
"The rest of core infrastructure; no new syscalls in that pile, but the
old parts are switched to new infrastructure. At that point
conversions of individual filesystems can happen independently; some
are done here (afs, cgroup, procfs, etc.), there's also a large series
outside of that pile dealing with NFS (quite a bit of option-parsing
stuff is getting used there - it's one of the most convoluted
filesystems in terms of mount-related logics), but NFS bits are the
next cycle fodder.
It got seriously simplified since the last cycle; documentation is
probably the weakest bit at the moment - I considered dropping the
commit introducing Documentation/filesystems/mount_api.txt (cutting
the size increase by quarter ;-), but decided that it would be better
to fix it up after -rc1 instead.
That pile allows to do followup work in independent branches, which
should make life much easier for the next cycle. fs/super.c size
increase is unpleasant; there's a followup series that allows to
shrink it considerably, but I decided to leave that until the next
cycle"
* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
afs: Use fs_context to pass parameters over automount
afs: Add fs_context support
vfs: Add some logging to the core users of the fs_context log
vfs: Implement logging through fs_context
vfs: Provide documentation for new mount API
vfs: Remove kern_mount_data()
hugetlbfs: Convert to fs_context
cpuset: Use fs_context
kernfs, sysfs, cgroup, intel_rdt: Support fs_context
cgroup: store a reference to cgroup_ns into cgroup_fs_context
cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
cgroup_do_mount(): massage calling conventions
cgroup: stash cgroup_root reference into cgroup_fs_context
cgroup2: switch to option-by-option parsing
cgroup1: switch to option-by-option parsing
cgroup: take options parsing into ->parse_monolithic()
cgroup: fold cgroup1_mount() into cgroup1_get_tree()
cgroup: start switching to fs_context
ipc: Convert mqueue fs to fs_context
proc: Add fs_context support to procfs
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAlx+8ZgUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOlDhAAiGlirQ9syyG2fYzaARZZ2QoU/GGD
PSAeiNmP3jvJzXArCvugRCw+YSNDdQOBM3SrLQC+cM0MAIDRYXN0NdcrsbTchlMA
51Fx1egZ9Fyj+Ehgida3muh2lRUy7DQwMCL6tAVqwz7vYkSTGDUf+MlYqOqXDka5
74pEExOS3Jdi7560BsE8b6QoW9JIJqEJnirXGkG9o2qC0oFHCR6PKxIyQ7TJrLR1
F23aFTqLTH1nbPUQjnox2PTf13iQVh4j2gwzd+9c9KBfxoGSge3dmxId7BJHy2aG
M27fPdCYTNZAGWpPVujsCPAh1WPQ9NQqg3mA9+g14PEbiLqPcqU+kWmnDU7T7bEw
Qx0kt6Y8GiknwCqq8pDbKYclgRmOjSGdfutzd0z8uDpbaeunS4/NqnDb/FUaDVcr
jA4d6ep7qEgHpYbL8KgOeZCexfaTfz6mcwRWNq3Uu9cLZbZqSSQ7PXolMADHvoRs
LS7VH2jcP7q4p4GWmdfjv67xyUUo9HG5HHX74h5pLfQSYXiBWo4ht0UOAzX/6EcE
CJNHAFHv+OanI5Rg/6JQ8b3/bJYxzAJVyLZpCuMtlKk6lYBGNeADk9BezEDIYsm8
tSe4/GqqyR9+Qz8rSdpAZ0KKkfqS535IcHUPUJau7Bzg1xqSEP5gzZN6QsjdXg0+
5wFFfdFICTfJFXo=
=57/1
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20190305' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"A lucky 13 audit patches for v5.1.
Despite the rather large diffstat, most of the changes are from two
bug fix patches that move code from one Kconfig option to another.
Beyond that bit of churn, the remaining changes are largely cleanups
and bug-fixes as we slowly march towards container auditing. It isn't
all boring though, we do have a couple of new things: file
capabilities v3 support, and expanded support for filtering on
filesystems to solve problems with remote filesystems.
All changes pass the audit-testsuite. Please merge for v5.1"
* tag 'audit-pr-20190305' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: mark expected switch fall-through
audit: hide auditsc_get_stamp and audit_serial prototypes
audit: join tty records to their syscall
audit: remove audit_context when CONFIG_ AUDIT and not AUDITSYSCALL
audit: remove unused actx param from audit_rule_match
audit: ignore fcaps on umount
audit: clean up AUDITSYSCALL prototypes and stubs
audit: more filter PATH records keyed on filesystem magic
audit: add support for fcaps v3
audit: move loginuid and sessionid from CONFIG_AUDITSYSCALL to CONFIG_AUDIT
audit: add syscall information to CONFIG_CHANGE records
audit: hand taken context to audit_kill_trees for syscall logging
audit: give a clue what CONFIG_CHANGE op was involved
new primitive: vfs_dup_fs_context(). Comes with fs_context
method (->dup()) for copying the filesystem-specific parts
of fs_context, along with LSM one (->fs_context_dup()) for
doing the same to LSM parts.
[needs better commit message, and change of Author:, anyway]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add LSM hooks for use by the new mount API and filesystem context code.
This includes:
(1) Hooks to handle allocation, duplication and freeing of the security
record attached to a filesystem context.
(2) A hook to snoop source specifications. There may be multiple of these
if the filesystem supports it. They will to be local files/devices if
fs_context::source_is_dev is true and will be something else, possibly
remote server specifications, if false.
(3) A hook to snoop superblock configuration options in key[=val] form.
If the LSM decides it wants to handle it, it can suppress the option
being passed to the filesystem. Note that 'val' may include commas
and binary data with the fsopen patch.
(4) A hook to perform validation and allocation after the configuration
has been done but before the superblock is allocated and set up.
(5) A hook to transfer the security from the context to a newly created
superblock.
(6) A hook to rule on whether a path point can be used as a mountpoint.
These are intended to replace:
security_sb_copy_data
security_sb_kern_mount
security_sb_mount
security_sb_set_mnt_opts
security_sb_clone_mnt_opts
security_sb_parse_opts_str
[AV -- some of the methods being replaced are already gone, some of the
methods are not added for the lack of need]
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-security-module@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
To avoid potential confusion, explicitly ignore "security=" when "lsm=" is
used on the command line, and report that it is happening.
Suggested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
The audit_rule_match() struct audit_context *actx parameter is not used
by any in-tree consumers (selinux, apparmour, integrity, smack).
The audit context is an internal audit structure that should only be
accessed by audit accessor functions.
It was part of commit 03d37d25e0 ("LSM/Audit: Introduce generic
Audit LSM hooks") but appears to have never been used.
Remove it.
Please see the github issue
https://github.com/linux-audit/audit-kernel/issues/107
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: fixed the referenced commit title]
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlxFDv0eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGBPsH/3Ij47fut8kwxGSX
Tmx7Y+VYftRiKSwK3+HxsCvde3scqfkxAukb3HeJDzZdpnouT0k4nqUYQabAANi/
MdaO+NSBRp/NjzZcpFG9QAroIQ2G2sRQ4E8ldFcNmdsjZWlUfKIHPfYHzvvc06L4
MhvdkpMa/p51Jz9egQs0kfSvrb6fh4OEDTI19/aaGR0oJBhoGhLrqTI+vdYhMiyO
wWtUXgZfsmlCBdAQLRh04CxGTc/32VApoB/SwP9sF+xD3gcL0mPFNKUociio6K2Y
a7u7yuzUKvVwuafVgX9QT+f+je5/5u+WFsG/26cfXzizZoNWW5oDl3sBD3hRNkvt
J13lB1w=
=ch+/
-----END PGP SIGNATURE-----
Merge tag 'v5.0-rc3' into next-general
Sync to Linux 5.0-rc3 to pull in the VFS changes which impacted a lot
of the LSM code.
Fixes the following sparse warnings:
security/security.c:533:5: warning:
symbol 'lsm_task_alloc' was not declared. Should it be static?
security/security.c:554:5: warning:
symbol 'lsm_ipc_alloc' was not declared. Should it be static?
security/security.c:575:5: warning:
symbol 'lsm_msg_msg_alloc' was not declared. Should it be static?
Fixes: f4ad8f2c40 ("LSM: Infrastructure management of the task security")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Since current->cred == current->real_cred when ordered_lsm_init()
is called, and lsm_early_cred()/lsm_early_task() need to be called
between the amount of required bytes is determined and module specific
initialization function is called, we can move these calls from
individual modules to ordered_lsm_init().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
From: Casey Schaufler <casey@schaufler-ca.com>
Check that the cred security blob has been set before trying
to clean it up. There is a case during credential initialization
that could result in this.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Reported-by: syzbot+69ca07954461f189e808@syzkaller.appspotmail.com
This patch provides a general mechanism for passing flags to the
security_capable LSM hook. It replaces the specific 'audit' flag that is
used to tell security_capable whether it should log an audit message for
the given capability check. The reason for generalizing this flag
passing is so we can add an additional flag that signifies whether
security_capable is being called by a setid syscall (which is needed by
the proposed SafeSetID LSM).
Signed-off-by: Micah Morton <mortonm@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.morris@microsoft.com>
Move management of the kern_ipc_perm->security and
msg_msg->security blobs out of the individual security
modules and into the security infrastructure. Instead
of allocating the blobs from within the modules the modules
tell the infrastructure how much space is required, and
the space is allocated there.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
[kees: adjusted for ordered init series]
Signed-off-by: Kees Cook <keescook@chromium.org>
Move management of the task_struct->security blob out
of the individual security modules and into the security
infrastructure. Instead of allocating the blobs from within
the modules the modules tell the infrastructure how much
space is required, and the space is allocated there.
The only user of this blob is AppArmor. The AppArmor use
is abstracted to avoid future conflict.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
[kees: adjusted for ordered init series]
Signed-off-by: Kees Cook <keescook@chromium.org>
Move management of the inode->i_security blob out
of the individual security modules and into the security
infrastructure. Instead of allocating the blobs from within
the modules the modules tell the infrastructure how much
space is required, and the space is allocated there.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
[kees: adjusted for ordered init series]
Signed-off-by: Kees Cook <keescook@chromium.org>
Move management of the file->f_security blob out of the
individual security modules and into the infrastructure.
The modules no longer allocate or free the data, instead
they tell the infrastructure how much space they require.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
[kees: adjusted for ordered init series]
Signed-off-by: Kees Cook <keescook@chromium.org>
Move management of the cred security blob out of the
security modules and into the security infrastructre.
Instead of allocating and freeing space the security
modules tell the infrastructure how much space they
require.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
[kees: adjusted for ordered init series]
Signed-off-by: Kees Cook <keescook@chromium.org>
Back in 2007 I made what turned out to be a rather serious
mistake in the implementation of the Smack security module.
The SELinux module used an interface in /proc to manipulate
the security context on processes. Rather than use a similar
interface, I used the same interface. The AppArmor team did
likewise. Now /proc/.../attr/current will tell you the
security "context" of the process, but it will be different
depending on the security module you're using.
This patch provides a subdirectory in /proc/.../attr for
Smack. Smack user space can use the "current" file in
this subdirectory and never have to worry about getting
SELinux attributes by mistake. Programs that use the
old interface will continue to work (or fail, as the case
may be) as before.
The proposed S.A.R.A security module is dependent on
the mechanism to create its own attr subdirectory.
The original implementation is by Kees Cook.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
This converts capabilities to use the new LSM_ORDER_FIRST position.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
In preparation for distinguishing the "capability" LSM from other LSMs, it
must be ordered first. This introduces LSM_ORDER_MUTABLE for the general
LSMs and LSM_ORDER_FIRST for capability. In the future LSM_ORDER_LAST
for could be added for anything that must run last (e.g. Landlock may
use this).
Signed-off-by: Kees Cook <keescook@chromium.org>
This converts Yama from being a direct "minor" LSM into an ordered LSM.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
This converts LoadPin from being a direct "minor" LSM into an ordered LSM.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Since we already have to do a pass through the LSMs to figure out if
exclusive LSMs should be disabled after the first one is seen as enabled,
this splits the logic up a bit more cleanly. Now we do a full "prepare"
pass through the LSMs (which also allows for later use by the blob-sharing
code), before starting the LSM initialization pass.
Signed-off-by: Kees Cook <keescook@chromium.org>
This removes CONFIG_DEFAULT_SECURITY in favor of the explicit ordering
offered by CONFIG_LSM and adds all the exclusive LSMs to the ordered
LSM initialization. The old meaning of CONFIG_DEFAULT_SECURITY is now
captured by which exclusive LSM is listed first in the LSM order. All
LSMs not added to the ordered list are explicitly disabled.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
In order to both support old "security=" Legacy Major LSM selection, and
handling real exclusivity, this creates LSM_FLAG_EXCLUSIVE and updates
the selection logic to handle them.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
For what are marked as the Legacy Major LSMs, make them effectively
exclusive when selected on the "security=" boot parameter, to handle
the future case of when a previously major LSMs become non-exclusive
(e.g. when TOMOYO starts blob-sharing).
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
This moves the string handling for "security=" boot parameter into
a stored pointer instead of a string duplicate. This will allow
easier handling of the string when switching logic to use the coming
enable/disable infrastructure.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Until now, any LSM without an enable storage variable was considered
enabled. This inverts the logic and sets defaults to true only if the
LSM gets added to the ordered initialization list. (And an exception
continues for the major LSMs until they are integrated into the ordered
initialization in a later patch.)
Signed-off-by: Kees Cook <keescook@chromium.org>
Provide a way to explicitly choose LSM initialization order via the new
"lsm=" comma-separated list of LSMs.
Signed-off-by: Kees Cook <keescook@chromium.org>
This provides a way to declare LSM initialization order via the new
CONFIG_LSM. Currently only non-major LSMs are recognized. This will
be expanded in future patches.
Signed-off-by: Kees Cook <keescook@chromium.org>
This constructs an ordered list of LSMs to initialize, using a hard-coded
list of only "integrity": minor LSMs continue to have direct hook calls,
and major LSMs continue to initialize separately.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
As a prerequisite to adjusting LSM selection logic in the future, this
moves the selection logic up out of the individual major LSMs, making
their init functions only run when actually enabled. This considers all
LSMs enabled by default unless they specified an external "enable"
variable.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
This provides a place for ordered LSMs to be initialized, separate from
the "major" LSMs. This is mainly a copy/paste from major_lsm_init() to
ordered_lsm_init(), but it will change drastically in later patches.
What is not obvious in the patch is that this change moves the integrity
LSM from major_lsm_init() into ordered_lsm_init(), since it is not marked
with the LSM_FLAG_LEGACY_MAJOR. As it is the only LSM in the "ordered"
list, there is no reordering yet created.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Pull vfs mount API prep from Al Viro:
"Mount API prereqs.
Mostly that's LSM mount options cleanups. There are several minor
fixes in there, but nothing earth-shattering (leaks on failure exits,
mostly)"
* 'mount.part1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (27 commits)
mount_fs: suppress MAC on MS_SUBMOUNT as well as MS_KERNMOUNT
smack: rewrite smack_sb_eat_lsm_opts()
smack: get rid of match_token()
smack: take the guts of smack_parse_opts_str() into a new helper
LSM: new method: ->sb_add_mnt_opt()
selinux: rewrite selinux_sb_eat_lsm_opts()
selinux: regularize Opt_... names a bit
selinux: switch away from match_token()
selinux: new helper - selinux_add_opt()
LSM: bury struct security_mnt_opts
smack: switch to private smack_mnt_opts
selinux: switch to private struct selinux_mnt_opts
LSM: hide struct security_mnt_opts from any generic code
selinux: kill selinux_sb_get_mnt_opts()
LSM: turn sb_eat_lsm_opts() into a method
nfs_remount(): don't leak, don't ignore LSM options quietly
btrfs: sanitize security_mnt_opts use
selinux; don't open-code a loop in sb_finish_set_opts()
LSM: split ->sb_set_mnt_opts() out of ->sb_kern_mount()
new helper: security_sb_eat_lsm_opts()
...
Adding options to growing mnt_opts. NFS kludge with passing
context= down into non-text-options mount switched to it, and
with that the last use of ->sb_parse_opts_str() is gone.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Keep void * instead, allocate on demand (in parse_str_opts, at the
moment). Eventually both selinux and smack will be better off
with private structures with several strings in those, rather than
this "counter and two pointers to dynamically allocated arrays"
ugliness. This commit allows to do that at leisure, without
disrupting anything outside of given module.
Changes:
* instead of struct security_mnt_opt use an opaque pointer
initialized to NULL.
* security_sb_eat_lsm_opts(), security_sb_parse_opts_str() and
security_free_mnt_opts() take it as var argument (i.e. as void **);
call sites are unchanged.
* security_sb_set_mnt_opts() and security_sb_remount() take
it by value (i.e. as void *).
* new method: ->sb_free_mnt_opts(). Takes void *, does
whatever freeing that needs to be done.
* ->sb_set_mnt_opts() and ->sb_remount() might get NULL as
mnt_opts argument, meaning "empty".
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Kill ->sb_copy_data() - it's used only in combination with immediately
following ->sb_parse_opts_str(). Turn that combination into a new
method.
This is just a mechanical move - cleanups will be the next step.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
1) keeping a copy in btrfs_fs_info is completely pointless - we never
use it for anything. Getting rid of that allows for simpler calling
conventions for setup_security_options() (caller is responsible for
freeing mnt_opts in all cases).
2) on remount we want to use ->sb_remount(), not ->sb_set_mnt_opts(),
same as we would if not for FS_BINARY_MOUNTDATA. Behaviours *are*
close (in fact, selinux sb_set_mnt_opts() ought to punt to
sb_remount() in "already initialized" case), but let's handle
that uniformly. And the only reason why the original btrfs changes
didn't go for security_sb_remount() in btrfs_remount() case is that
it hadn't been exported. Let's export it for a while - it'll be
going away soon anyway.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... leaving the "is it kernel-internal" logics in the caller.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>