With the new standardized functions, we can replace all ACCESS_ONCE()
calls across relevant security/keyrings/.
ACCESS_ONCE() does not work reliably on non-scalar types. For example
gcc 4.6 and 4.7 might remove the volatile tag for such accesses during
the SRA (scalar replacement of aggregates) step:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145
Update the new calls regardless of if it is a scalar type, this is
cleaner than having three alternatives.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
CONFIG_KEYS_COMPAT is defined in arch-specific Kconfigs and is missing for
several 64-bit architectures : mips, parisc, tile.
At the moment and for those architectures, calling in 32-bit userspace the
keyctl syscall would return an ENOSYS error.
This patch moves the CONFIG_KEYS_COMPAT option to security/keys/Kconfig, to
make sure the compatibility wrapper is registered by default for any 64-bit
architecture as long as it is configured with CONFIG_COMPAT.
[DH: Modified to remove arm64 compat enablement also as requested by Eric
Biggers]
Signed-off-by: Bilal Amarni <bilal.amarni@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
cc: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Virtualize the apparmor policy/ directory so that the current
namespace affects what part of policy is seen. To do this convert to
using apparmorfs for policy namespace files and setup a magic symlink
in the securityfs apparmor dir to access those files.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Seth Arnold <seth.arnold@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
prefixes are used for fns/data that are not static to apparmorfs.c
with the prefixes being
aafs - special magic apparmorfs for policy namespace data
aa_sfs - for fns/data that go into securityfs
aa_fs - for fns/data that may be used in the either of aafs or
securityfs
Signed-off-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Seth Arnold <seth.arnold@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
AppArmor policy needs to be able to be resolved based on the policy
namespace a task is confined by. Add a base apparmorfs filesystem that
(like nsfs) will exist as a kern mount and be accessed via jump_link
through a securityfs file.
Setup the base apparmorfs fns and data, but don't use it yet.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Seth Arnold <seth.arnold@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
The loaddata sets cover more than just a single profile and should
be tracked at the ns level. Move the load data files under the namespace
and reference the files from the profiles via a symlink.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Seth Arnold <seth.arnold@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Dynamically allocating buffers is problematic and is an extra layer
that is a potntial point of failure and can slow down mediation.
Change path lookup to use the preallocated per cpu buffers.
Signed-off-by: John Johansen <john.johansen@canonical.com>
When using a strictly POSIX-compliant shell, "-n #define ..." gets
written into the file. Use "printf '%s'" to avoid this.
Signed-off-by: Thomas Schneider <qsx@qsx.re>
Signed-off-by: John Johansen <john.johansen@canonical.com>
We can either return PTR_ERR(NULL) or a PTR_ERR(a valid pointer) here.
Returning NULL is probably not good, but since this happens at boot
then we are probably already toasted if we were to hit this bug in real
life. In other words, it seems like a very low severity bug to me.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Two single characters (line breaks) should be put into a sequence.
Thus use the corresponding function "seq_putc".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: John Johansen <john.johansen@canonical.com>
A bit of data was put into a sequence by two separate function calls.
Print the same data by a single function call instead.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: John Johansen <john.johansen@canonical.com>
For some file systems we still memcpy into it, but in various places this
already allows us to use the proper uuid helpers. More to come..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> (Changes to IMA/EVM)
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
This helper was only used by IMA of all things, which would get spurious
errors if CONFIG_BLOCK is disabled. Just opencode the call there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
It will allow us to remove the old netfilter hook api in the near future.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use cap_capable() rather than capable() in the Smack privilege
check as the former does not invoke other security module
privilege check, while the later does. This becomes important
when stacking. It may be a problem even with minor modules.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
The check of S_ISSOCK() in smack_file_receive() is not
appropriate if the passed descriptor is a socket.
Reported-by: Stephen Smalley <sds@tyco.nsa.gov>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
It will allow us to remove the old netfilter hook api in the near future.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
It is likely that the SID for the same PKey will be requested many
times. To reduce the time to modify QPs and process MADs use a cache to
store PKey SIDs.
This code is heavily based on the "netif" and "netport" concept
originally developed by James Morris <jmorris@redhat.com> and Paul Moore
<paul@paul-moore.com> (see security/selinux/netif.c and
security/selinux/netport.c for more information)
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add a type for Infiniband ports and an access vector for subnet
management packets. Implement the ib_port_smp hook to check that the
caller has permission to send and receive SMPs on the end port specified
by the device name and port. Add interface to query the SID for a IB
port, which walks the IB_PORT ocontexts to find an entry for the
given name and port.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add a type and access vector for PKeys. Implement the ib_pkey_access
hook to check that the caller has permission to access the PKey on the
given subnet prefix. Add an interface to get the PKey SID. Walk the PKey
ocontexts to find an entry for the given subnet prefix and pkey.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Implement and attach hooks to allocate and free Infiniband object
security structures.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Support for Infiniband requires the addition of two new object contexts,
one for infiniband PKeys and another IB Ports. Added handlers to read
and write the new ocontext types when reading or writing a binary policy
representation.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Allocate and free a security context when creating and destroying a MAD
agent. This context is used for controlling access to PKeys and sending
and receiving SMPs.
When sending or receiving a MAD check that the agent has permission to
access the PKey for the Subnet Prefix of the port.
During MAD and snoop agent registration for SMI QPs check that the
calling process has permission to access the manage the subnet and
register a callback with the LSM to be notified of policy changes. When
notificaiton of a policy change occurs recheck permission and set a flag
indicating sending and receiving SMPs is allowed.
When sending and receiving MADs check that the agent has access to the
SMI if it's on an SMI QP. Because security policy can change it's
possible permission was allowed when creating the agent, but no longer
is.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
[PM: remove the LSM hook init code]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add a generic notificaiton mechanism in the LSM. Interested consumers
can register a callback with the LSM and security modules can produce
events.
Because access to Infiniband QPs are enforced in the setup phase of a
connection security should be enforced again if the policy changes.
Register infiniband devices for policy change notification and check all
QPs on that device when the notification is received.
Add a call to the notification mechanism from SELinux when the AVC
cache changes or setenforce is cleared.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add new LSM hooks to allocate and free security contexts and check for
permission to access a PKey.
Allocate and free a security context when creating and destroying a QP.
This context is used for controlling access to PKeys.
When a request is made to modify a QP that changes the port, PKey index,
or alternate path, check that the QP has permission for the PKey in the
PKey table index on the subnet prefix of the port. If the QP is shared
make sure all handles to the QP also have access.
Store which port and PKey index a QP is using. After the reset to init
transition the user can modify the port, PKey index and alternate path
independently. So port and PKey settings changes can be a merge of the
previous settings and the new ones.
In order to maintain access control if there are PKey table or subnet
prefix change keep a list of all QPs are using each PKey index on
each port. If a change occurs all QPs using that device and port must
have access enforced for the new cache settings.
These changes add a transaction to the QP modify process. Association
with the old port and PKey index must be maintained if the modify fails,
and must be removed if it succeeds. Association with the new port and
PKey index must be established prior to the modify and removed if the
modify fails.
1. When a QP is modified to a particular Port, PKey index or alternate
path insert that QP into the appropriate lists.
2. Check permission to access the new settings.
3. If step 2 grants access attempt to modify the QP.
4a. If steps 2 and 3 succeed remove any prior associations.
4b. If ether fails remove the new setting associations.
If a PKey table or subnet prefix changes walk the list of QPs and
check that they have permission. If not send the QP to the error state
and raise a fatal error event. If it's a shared QP make sure all the
QPs that share the real_qp have permission as well. If the QP that
owns a security structure is denied access the security structure is
marked as such and the QP is added to an error_list. Once the moving
the QP to error is complete the security structure mark is cleared.
Maintaining the lists correctly turns QP destroy into a transaction.
The hardware driver for the device frees the ib_qp structure, so while
the destroy is in progress the ib_qp pointer in the ib_qp_security
struct is undefined. When the destroy process begins the ib_qp_security
structure is marked as destroying. This prevents any action from being
taken on the QP pointer. After the QP is destroyed successfully it
could still listed on an error_list wait for it to be processed by that
flow before cleaning up the structure.
If the destroy fails the QPs port and PKey settings are reinserted into
the appropriate lists, the destroying flag is cleared, and access control
is enforced, in case there were any cache changes during the destroy
flow.
To keep the security changes isolated a new file is used to hold security
related functionality.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
[PM: merge fixup in ib_verbs.h and uverbs_cmd.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>
The check is already performed in ocontext_read() when the policy is
loaded. Removing the array also fixes the following warning when
building with clang:
security/selinux/hooks.c:338:20: error: variable 'labeling_behaviors'
is not needed and will not be emitted
[-Werror,-Wunneeded-internal-declaration]
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Log the state of SELinux policy capabilities when a policy is loaded.
For each policy capability known to the kernel, log the policy capability
name and the value set in the policy. For policy capabilities that are
set in the loaded policy but unknown to the kernel, log the policy
capability index, since this is the only information presently available
in the policy.
Sample output with a policy created with a new capability defined
that is not known to the kernel:
SELinux: policy capability network_peer_controls=1
SELinux: policy capability open_perms=1
SELinux: policy capability extended_socket_class=1
SELinux: policy capability always_check_network=0
SELinux: policy capability cgroup_seclabel=0
SELinux: unknown policy capability 5
Resolves: https://github.com/SELinuxProject/selinux-kernel/issues/32
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
open permission is currently only defined for files in the kernel
(COMMON_FILE_PERMS rather than COMMON_FILE_SOCK_PERMS). Construction of
an artificial test case that tries to open a socket via /proc/pid/fd will
generate a recvfrom avc denial because recvfrom and open happen to map to
the same permission bit in socket vs file classes.
open of a socket via /proc/pid/fd is not supported by the kernel regardless
and will ultimately return ENXIO. But we hit the permission check first and
can thus produce these odd/misleading denials. Omit the open check when
operating on a socket.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add a map permission check on mmap so that we can distinguish memory mapped
access (since it has different implications for revocation). When a file
is opened and then read or written via syscalls like read(2)/write(2),
we revalidate access on each read/write operation via
selinux_file_permission() and therefore can revoke access if the
process context, the file context, or the policy changes in such a
manner that access is no longer allowed. When a file is opened and then
memory mapped via mmap(2) and then subsequently read or written directly
in memory, we presently have no way to revalidate or revoke access.
The purpose of a separate map permission check on mmap(2) is to permit
policy to prohibit memory mapping of specific files for which we need
to ensure that every access is revalidated, particularly useful for
scenarios where we expect the file to be relabeled at runtime in order
to reflect state changes (e.g. cross-domain solution, assured pipeline
without data copying).
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
SELinux uses CAP_MAC_ADMIN to control the ability to get or set a raw,
uninterpreted security context unknown to the currently loaded security
policy. When performing these checks, we only want to perform a base
capabilities check and a SELinux permission check. If any other
modules that implement a capable hook are stacked with SELinux, we do
not want to require them to also have to authorize CAP_MAC_ADMIN,
since it may have different implications for their security model.
Rework the CAP_MAC_ADMIN checks within SELinux to only invoke the
capabilities module and the SELinux permission checking.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
* Return an error code without storing it in an intermediate variable.
* Delete the local variable "rc" and the jump label "out" which became
unnecessary with this refactoring.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Replace five goto statements (and previous variable assignments) by
direct returns after a memory allocation failure in this function.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This patch is a preparation for getting rid of task_create hook because
task_alloc hook which can do what task_create hook can do was revived.
Creating a new thread is unlikely prohibited by security policy, for
fork()/execve()/exit() is fundamental of how processes are managed in
Unix. If a program is known to create a new thread, it is likely that
permission to create a new thread is given to that program. Therefore,
a situation where security_task_create() returns an error is likely that
the program was exploited and lost control. Even if SELinux failed to
check permission to create a thread at security_task_create(), SELinux
can later check it at security_task_alloc(). Since the new thread is not
yet visible from the rest of the system, nobody can do bad things using
the new thread. What we waste will be limited to some initialization
steps such as dup_task_struct(), copy_creds() and audit_alloc() in
copy_process(). We can tolerate these overhead for unlikely situation.
Therefore, this patch changes SELinux to use task_alloc hook rather than
task_create hook so that we can remove task_create hook.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Adjusts for ReST markup and moves under keys security devel index.
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Adjusts for ReST markup and moves under LSM admin guide.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Adjusts for ReST markup and moves under LSM admin guide.
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The commit d69dece5f5 ("LSM: Add /sys/kernel/security/lsm") extend
security_add_hooks() with a new parameter to register the LSM name,
which may be useful to make the list of currently loaded LSM available
to userspace. However, there is no clean way for an LSM to split its
hook declarations into multiple files, which may reduce the mess with
all the included files (needed for LSM hook argument types) and make the
source code easier to review and maintain.
This change allows an LSM to register multiple times its hook while
keeping a consistent list of LSM names as described in
Documentation/security/LSM.txt . The list reflects the order in which
checks are made. This patch only check for the last registered LSM. If
an LSM register multiple times its hooks, interleaved with other LSM
registrations (which should not happen), its name will still appear in
the same order that the hooks are called, hence multiple times.
To sum up, "capability,selinux,foo,foo" will be replaced with
"capability,selinux,foo", however "capability,foo,selinux,foo" will
remain as is.
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Pull misc vfs updates from Al Viro:
"Assorted bits and pieces from various people. No common topic in this
pile, sorry"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/affs: add rename exchange
fs/affs: add rename2 to prepare multiple methods
Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()
fs: don't set *REFERENCED on single use objects
fs: compat: Remove warning from COMPATIBLE_IOCTL
remove pointless extern of atime_need_update_rcu()
fs: completely ignore unknown open flags
fs: add a VALID_OPEN_FLAGS
fs: remove _submit_bh()
fs: constify tree_descr arrays passed to simple_fill_super()
fs: drop duplicate header percpu-rwsem.h
fs/affs: bugfix: Write files greater than page size on OFS
fs/affs: bugfix: enable writes on OFS disks
fs/affs: remove node generation check
fs/affs: import amigaffs.h
fs/affs: bugfix: make symbolic links work again
CURRENT_TIME macro is not y2038 safe on 32 bit systems.
The patch replaces all the uses of CURRENT_TIME by current_time().
This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.
current_time() is also planned to be transitioned to y2038 safe behavior
along with this change.
CURRENT_TIME macro will be deleted before merging the aforementioned
change.
Link: http://lkml.kernel.org/r/1491613030-11599-11-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are many code paths opencoding kvmalloc. Let's use the helper
instead. The main difference to kvmalloc is that those users are
usually not considering all the aspects of the memory allocator. E.g.
allocation requests <= 32kB (with 4kB pages) are basically never failing
and invoke OOM killer to satisfy the allocation. This sounds too
disruptive for something that has a reasonable fallback - the vmalloc.
On the other hand those requests might fallback to vmalloc even when the
memory allocator would succeed after several more reclaim/compaction
attempts previously. There is no guarantee something like that happens
though.
This patch converts many of those places to kv[mz]alloc* helpers because
they are more conservative.
Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "kvmalloc", v5.
There are many open coded kmalloc with vmalloc fallback instances in the
tree. Most of them are not careful enough or simply do not care about
the underlying semantic of the kmalloc/page allocator which means that
a) some vmalloc fallbacks are basically unreachable because the kmalloc
part will keep retrying until it succeeds b) the page allocator can
invoke a really disruptive steps like the OOM killer to move forward
which doesn't sound appropriate when we consider that the vmalloc
fallback is available.
As it can be seen implementing kvmalloc requires quite an intimate
knowledge if the page allocator and the memory reclaim internals which
strongly suggests that a helper should be implemented in the memory
subsystem proper.
Most callers, I could find, have been converted to use the helper
instead. This is patch 6. There are some more relying on __GFP_REPEAT
in the networking stack which I have converted as well and Eric Dumazet
was not opposed [2] to convert them as well.
[1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com
This patch (of 9):
Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code. Yet we do not have any common helper
for that and so users have invented their own helpers. Some of them are
really creative when doing so. Let's just add kv[mz]alloc and make sure
it is implemented properly. This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures. This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.
This patch also changes some existing users and removes helpers which
are specific for them. In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
require GFP_NO{FS,IO} context which is not vmalloc compatible in general
(note that the page table allocation is GFP_KERNEL). Those need to be
fixed separately.
While we are at it, document that __vmalloc{_node} about unsupported gfp
mask because there seems to be a lot of confusion out there.
kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
superset) flags to catch new abusers. Existing ones would have to die
slowly.
[sfr@canb.auug.org.au: f2fs fixup]
Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Andreas Dilger <adilger@dilger.ca> [ext4 part]
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull security subsystem updates from James Morris:
"Highlights:
IMA:
- provide ">" and "<" operators for fowner/uid/euid rules
KEYS:
- add a system blacklist keyring
- add KEYCTL_RESTRICT_KEYRING, exposes keyring link restriction
functionality to userland via keyctl()
LSM:
- harden LSM API with __ro_after_init
- add prlmit security hook, implement for SELinux
- revive security_task_alloc hook
TPM:
- implement contextual TPM command 'spaces'"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (98 commits)
tpm: Fix reference count to main device
tpm_tis: convert to using locality callbacks
tpm: fix handling of the TPM 2.0 event logs
tpm_crb: remove a cruft constant
keys: select CONFIG_CRYPTO when selecting DH / KDF
apparmor: Make path_max parameter readonly
apparmor: fix parameters so that the permission test is bypassed at boot
apparmor: fix invalid reference to index variable of iterator line 836
apparmor: use SHASH_DESC_ON_STACK
security/apparmor/lsm.c: set debug messages
apparmor: fix boolreturn.cocci warnings
Smack: Use GFP_KERNEL for smk_netlbl_mls().
smack: fix double free in smack_parse_opts_str()
KEYS: add SP800-56A KDF support for DH
KEYS: Keyring asymmetric key restrict method with chaining
KEYS: Restrict asymmetric key linkage using a specific keychain
KEYS: Add a lookup_restriction function for the asymmetric key type
KEYS: Add KEYCTL_RESTRICT_KEYRING
KEYS: Consistent ordering for __key_link_begin and restrict check
KEYS: Add an optional lookup_restriction hook to key_type
...
Pull networking updates from David Millar:
"Here are some highlights from the 2065 networking commits that
happened this development cycle:
1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)
2) Add a generic XDP driver, so that anyone can test XDP even if they
lack a networking device whose driver has explicit XDP support
(me).
3) Sparc64 now has an eBPF JIT too (me)
4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
Starovoitov)
5) Make netfitler network namespace teardown less expensive (Florian
Westphal)
6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)
7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)
8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)
9) Multiqueue support in stmmac driver (Joao Pinto)
10) Remove TCP timewait recycling, it never really could possibly work
well in the real world and timestamp randomization really zaps any
hint of usability this feature had (Soheil Hassas Yeganeh)
11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
Aleksandrov)
12) Add socket busy poll support to epoll (Sridhar Samudrala)
13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
and several others)
14) IPSEC hw offload infrastructure (Steffen Klassert)"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
tipc: refactor function tipc_sk_recv_stream()
tipc: refactor function tipc_sk_recvmsg()
net: thunderx: Optimize page recycling for XDP
net: thunderx: Support for XDP header adjustment
net: thunderx: Add support for XDP_TX
net: thunderx: Add support for XDP_DROP
net: thunderx: Add basic XDP support
net: thunderx: Cleanup receive buffer allocation
net: thunderx: Optimize CQE_TX handling
net: thunderx: Optimize RBDR descriptor handling
net: thunderx: Support for page recycling
ipx: call ipxitf_put() in ioctl error path
net: sched: add helpers to handle extended actions
qed*: Fix issues in the ptp filter config implementation.
qede: Fix concurrency issue in PTP Tx path processing.
stmmac: Add support for SIMATIC IOT2000 platform
net: hns: fix ethtool_get_strings overflow in hns driver
tcp: fix wraparound issue in tcp_lp
bpf, arm64: fix jit branch offset related to ldimm64
bpf, arm64: implement jiting of BPF_XADD
...
guide for user-space API documents, rather sparsely populated at the
moment, but it's a start. Markus improved the infrastructure for
converting diagrams. Mauro has converted much of the USB documentation
over to RST. Plus the usual set of fixes, improvements, and tweaks.
There's a bit more than the usual amount of reaching out of Documentation/
to fix comments elsewhere in the tree; I have acks for those where I could
get them.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZB1elAAoJEI3ONVYwIuV6wUIQAJSM/4rNdj6z+GXeWhRfbsOo
vqqVYluvXQIJaaqdsy9dgcfThhOXWYsPyVF6Xd+bDJpwF3BMZYbX1CI1Mo3kRD+7
9+Pf68cYSHRoU3l/sFI8q0zfKbHtmFteIvnRQoFtRaExqgTR8glUfxNDyN9XuNAZ
3naS4qMZivM4gjMcSpIB/wFOQpV+6qVIs6VTFLdCC8wodT3W/Wmb+bqrCVJ0twbB
t8jJeYHt2wsiTdqrKU+VilAUAZ1Lby+DNfeWrO18rC1ohktPyUzOGg8JmTKUBpVO
qj1OJwD6abuaNh/J9bXsh8u0OrVrBKWjVrhq9IFYDlm92fu3Bgr6YeoaVPEpcklt
jdlgZnWs9/oXa6d32aMc9F7mP9a0Q1qikFTYINhaHQZCb4VDRuQ9hCSuqWm5jlVy
lmVAoxLa0zSdOoXaYuO3HC99ku1cIn814CXMDz/IwKXkqUCV+zl+H3AGkvxGyQ5M
eblw2TnQnc6e1LRcxt5bgpFR1JYMbCJhu0U5XrNFueQV8ReB15dvL7h4y21dWJKF
2Sr83rwfG1rpZQiSqCjOXxIzuXbEGH3+a+zCDV5IHhQRt/VNDOt2hgmcyucSSJ5h
5GRFYgTlGvoT/6LdIT39QooHB+4tSDRtEQ6lh0q2ZtVV2rfG/I6/PR5sUbWM65SN
vAfctRm2afHLhdonSX5O
=41m+
-----END PGP SIGNATURE-----
Merge tag 'docs-4.12' of git://git.lwn.net/linux
Pull documentation update from Jonathan Corbet:
"A reasonably busy cycle for documentation this time around. There is a
new guide for user-space API documents, rather sparsely populated at
the moment, but it's a start. Markus improved the infrastructure for
converting diagrams. Mauro has converted much of the USB documentation
over to RST. Plus the usual set of fixes, improvements, and tweaks.
There's a bit more than the usual amount of reaching out of
Documentation/ to fix comments elsewhere in the tree; I have acks for
those where I could get them"
* tag 'docs-4.12' of git://git.lwn.net/linux: (74 commits)
docs: Fix a couple typos
docs: Fix a spelling error in vfio-mediated-device.txt
docs: Fix a spelling error in ioctl-number.txt
MAINTAINERS: update file entry for HSI subsystem
Documentation: allow installing man pages to a user defined directory
Doc/PM: Sync with intel_powerclamp code behavior
zr364xx.rst: usb/devices is now at /sys/kernel/debug/
usb.rst: move documentation from proc_usb_info.txt to USB ReST book
convert philips.txt to ReST and add to media docs
docs-rst: usb: update old usbfs-related documentation
arm: Documentation: update a path name
docs: process/4.Coding.rst: Fix a couple of document refs
docs-rst: fix usb cross-references
usb: gadget.h: be consistent at kernel doc macros
usb: composite.h: fix two warnings when building docs
usb: get rid of some ReST doc build errors
usb.rst: get rid of some Sphinx errors
usb/URB.txt: convert to ReST and update it
usb/persist.txt: convert to ReST and add to driver-api book
usb/hotplug.txt: convert to ReST and add to driver-api book
...
Pull uaccess unification updates from Al Viro:
"This is the uaccess unification pile. It's _not_ the end of uaccess
work, but the next batch of that will go into the next cycle. This one
mostly takes copy_from_user() and friends out of arch/* and gets the
zero-padding behaviour in sync for all architectures.
Dealing with the nocache/writethrough mess is for the next cycle;
fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am
sold on access_ok() in there, BTW; just not in this pile), same for
reducing __copy_... callsites, strn*... stuff, etc. - there will be a
pile about as large as this one in the next merge window.
This one sat in -next for weeks. -3KLoC"
* 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits)
HAVE_ARCH_HARDENED_USERCOPY is unconditional now
CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
m32r: switch to RAW_COPY_USER
hexagon: switch to RAW_COPY_USER
microblaze: switch to RAW_COPY_USER
get rid of padding, switch to RAW_COPY_USER
ia64: get rid of copy_in_user()
ia64: sanitize __access_ok()
ia64: get rid of 'segment' argument of __do_{get,put}_user()
ia64: get rid of 'segment' argument of __{get,put}_user_check()
ia64: add extable.h
powerpc: get rid of zeroing, switch to RAW_COPY_USER
esas2r: don't open-code memdup_user()
alpha: fix stack smashing in old_adjtimex(2)
don't open-code kernel_setsockopt()
mips: switch to RAW_COPY_USER
mips: get rid of tail-zeroing in primitives
mips: make copy_from_user() zero tail explicitly
mips: clean and reorder the forest of macros...
mips: consolidate __invoke_... wrappers
...
simple_fill_super() is passed an array of tree_descr structures which
describe the files to create in the filesystem's root directory. Since
these arrays are never modified intentionally, they should be 'const' so
that they are placed in .rodata and benefit from memory protection.
This patch updates the function signature and all users, and also
constifies tree_descr.name.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Both conflict were simple overlapping changes.
In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the
conversion of the driver in net-next to use in-netdev stats.
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes CVE-2017-7472.
Running the following program as an unprivileged user exhausts kernel
memory by leaking thread keyrings:
#include <keyutils.h>
int main()
{
for (;;)
keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_THREAD_KEYRING);
}
Fix it by only creating a new thread keyring if there wasn't one before.
To make things more consistent, make install_thread_keyring_to_cred()
and install_process_keyring_to_cred() both return 0 if the corresponding
keyring is already present.
Fixes: d84f4f992c ("CRED: Inaugurate COW credentials")
Cc: stable@vger.kernel.org # 2.6.29+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
This fixes CVE-2017-6951.
Userspace should not be able to do things with the "dead" key type as it
doesn't have some of the helper functions set upon it that the kernel
needs. Attempting to use it may cause the kernel to crash.
Fix this by changing the name of the type to ".dead" so that it's rejected
up front on userspace syscalls by key_get_type_from_user().
Though this doesn't seem to affect recent kernels, it does affect older
ones, certainly those prior to:
commit c06cfb08b8
Author: David Howells <dhowells@redhat.com>
Date: Tue Sep 16 17:36:06 2014 +0100
KEYS: Remove key_type::match in favour of overriding default by match_preparse
which went in before 3.18-rc1.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
This fixes CVE-2016-9604.
Keyrings whose name begin with a '.' are special internal keyrings and so
userspace isn't allowed to create keyrings by this name to prevent
shadowing. However, the patch that added the guard didn't fix
KEYCTL_JOIN_SESSION_KEYRING. Not only can that create dot-named keyrings,
it can also subscribe to them as a session keyring if they grant SEARCH
permission to the user.
This, for example, allows a root process to set .builtin_trusted_keys as
its session keyring, at which point it has full access because now the
possessor permissions are added. This permits root to add extra public
keys, thereby bypassing module verification.
This also affects kexec and IMA.
This can be tested by (as root):
keyctl session .builtin_trusted_keys
keyctl add user a a @s
keyctl list @s
which on my test box gives me:
2 keys in keyring:
180010936: ---lswrv 0 0 asymmetric: Build time autogenerated kernel key: ae3d4a31b82daa8e1a75b49dc2bba949fd992a05
801382539: --alswrv 0 0 user: a
Fix this by rejecting names beginning with a '.' in the keyctl.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
cc: linux-ima-devel@lists.sourceforge.net
cc: stable@vger.kernel.org
Select CONFIG_CRYPTO in addition to CONFIG_HASH to ensure that
also CONFIG_HASH2 is selected. Both are needed for the shash
cipher support required for the KDF operation.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Boot parameters are written before apparmor is ready to answer whether
the user is policy_view_capable(). Setting the parameters at boot results
in an oops and failure to boot. Setting the parameters at boot is
obviously allowed so skip the permission check when apparmor is not
initialized.
While we are at it move the more complicated check to last.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Once the loop on lines 836-853 is complete and exits normally, ent is a
pointer to the dummy list head value. The derefernces accessible from eg
the goto fail on line 860 or the various goto fail_lock's afterwards thus
seem incorrect.
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
When building the kernel with clang, the compiler fails to build
security/apparmor/crypto.c with the following error:
security/apparmor/crypto.c:36:8: error: fields must have a constant
size: 'variable length array in structure' extension will never be
supported
char ctx[crypto_shash_descsize(apparmor_tfm)];
^
Since commit a0a77af141 ("crypto: LLVMLinux: Add macro to remove use
of VLAIS in crypto code"), include/crypto/hash.h defines
SHASH_DESC_ON_STACK to work around this issue. Use it in aa_calc_hash()
and aa_calc_profile_hash().
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Add the _APPARMOR substring to reference the intended Kconfig option.
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
security/apparmor/lib.c:132:9-10: WARNING: return of 0/1 in function 'aa_policy_init' with return type bool
Return statements in functions returning bool should use
true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Since all callers of smk_netlbl_mls() are GFP_KERNEL context
(smk_set_cipso() calls memdup_user_nul(), init_smk_fs() calls
__kernfs_new_node(), smk_import_entry() calls kzalloc(GFP_KERNEL)),
it is safe to use GFP_KERNEL from netlbl_catmap_setbit().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
smack_parse_opts_str() calls kfree(opts->mnt_opts) when kcalloc() for
opts->mnt_opts_flags failed. But it should not have called it because
security_free_mnt_opts() will call kfree(opts->mnt_opts).
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
fixes: 3bf2789cad ("smack: allow mount opts setting over filesystems with binary mount data")
Cc: Vivek Trivedi <t.vivek@samsung.com>
Cc: Amit Sahrawat <a.sahrawat@samsung.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
SP800-56A defines the use of DH with key derivation function based on a
counter. The input to the KDF is defined as (DH shared secret || other
information). The value for the "other information" is to be provided by
the caller.
The KDF is implemented using the hash support from the kernel crypto API.
The implementation uses the symmetric hash support as the input to the
hash operation is usually very small. The caller is allowed to specify
the hash name that he wants to use to derive the key material allowing
the use of all supported hashes provided with the kernel crypto API.
As the KDF implements the proper truncation of the DH shared secret to
the requested size, this patch fills the caller buffer up to its size.
The patch is tested with a new test added to the keyutils user space
code which uses a CAVS test vector testing the compliance with
SP800-56A.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Keyrings recently gained restrict_link capabilities that allow
individual keys to be validated prior to linking. This functionality
was only available using internal kernel APIs.
With the KEYCTL_RESTRICT_KEYRING command existing keyrings can be
configured to check the content of keys before they are linked, and
then allow or disallow linkage of that key to the keyring.
To restrict a keyring, call:
keyctl(KEYCTL_RESTRICT_KEYRING, key_serial_t keyring, const char *type,
const char *restriction)
where 'type' is the name of a registered key type and 'restriction' is a
string describing how key linkage is to be restricted. The restriction
option syntax is specific to each key type.
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
The keyring restrict callback was sometimes called before
__key_link_begin and sometimes after, which meant that the keyring
semaphores were not always held during the restrict callback.
If the semaphores are consistently acquired before checking link
restrictions, keyring contents cannot be changed after the restrict
check is complete but before the evaluated key is linked to the keyring.
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Replace struct key's restrict_link function pointer with a pointer to
the new struct key_restriction. The structure contains pointers to the
restriction function as well as relevant data for evaluating the
restriction.
The garbage collector checks restrict_link->keytype when key types are
unregistered. Restrictions involving a removed key type are converted
to use restrict_link_reject so that restrictions cannot be removed by
unregistering key types.
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
The first argument to the restrict_link_func_t functions was a keyring
pointer. These functions are called by the key subsystem with this
argument set to the destination keyring, but restrict_link_by_signature
expects a pointer to the relevant trusted keyring.
Restrict functions may need something other than a single struct key
pointer to allow or reject key linkage, so the data used to make that
decision (such as the trust keyring) is moved to a new, fourth
argument. The first argument is now always the destination keyring.
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
This pointer type needs to be returned from a lookup function, and
without a typedef the syntax gets cumbersome.
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
We removed this initialization as a cleanup but it is probably required.
The concern is that "nel" can be zero. I'm not an expert on SELinux
code but I think it looks possible to write an SELinux policy which
triggers this bug. GCC doesn't catch this, but my static checker does.
Fixes: 9c312e79d6 ("selinux: Delete an unnecessary variable initialisation in range_read()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
'perms' will never be NULL since it isn't a plain pointer but an array
of u32 values.
This fixes the following warning when building with clang:
security/selinux/ss/services.c:158:16: error: address of array
'p_in->perms' will always evaluate to 'true'
[-Werror,-Wpointer-bool-conversion]
while (p_in->perms && p_in->perms[k]) {
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
A string which did not contain data format specifications should be put
into a sequence. Thus use the corresponding function "seq_puts".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The script "checkpatch.pl" pointed information out like the following.
Comparison to NULL could be written !…
Thus fix affected source code places.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Replace the specification of a data type by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
We switched from "struct task_struct"->security to "struct cred"->security
in Linux 2.6.29. But not all LSM modules were happy with that change.
TOMOYO LSM module is an example which want to use per "struct task_struct"
security blob, for TOMOYO's security context is defined based on "struct
task_struct" rather than "struct cred". AppArmor LSM module is another
example which want to use it, for AppArmor is currently abusing the cred
a little bit to store the change_hat and setexeccon info. Although
security_task_free() hook was revived in Linux 3.4 because Yama LSM module
wanted to release per "struct task_struct" security blob,
security_task_alloc() hook and "struct task_struct"->security field were
not revived. Nowadays, we are getting proposals of lightweight LSM modules
which want to use per "struct task_struct" security blob.
We are already allowing multiple concurrent LSM modules (up to one fully
armored module which uses "struct cred"->security field or exclusive hooks
like security_xfrm_state_pol_flow_match(), plus unlimited number of
lightweight modules which do not use "struct cred"->security nor exclusive
hooks) as long as they are built into the kernel. But this patch does not
implement variable length "struct task_struct"->security field which will
become needed when multiple LSM modules want to use "struct task_struct"->
security field. Although it won't be difficult to implement variable length
"struct task_struct"->security field, let's think about it after we merged
this patch.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Tested-by: Djalal Harouni <tixxdz@gmail.com>
Acked-by: José Bollo <jobol@nonadev.net>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: James Morris <james.l.morris@oracle.com>
Cc: José Bollo <jobol@nonadev.net>
Signed-off-by: James Morris <james.l.morris@oracle.com>
"struct security_hook_heads" is an array of "struct list_head"
where elements can be initialized just before registration.
There is no need to waste 350+ lines for initialization. Let's
initialize "struct security_hook_heads" just before registration.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: James Morris <james.l.morris@oracle.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
The local variable "rt" will be set to an appropriate pointer a bit later.
Thus omit the explicit initialisation at the beginning which became
unnecessary with a previous update step.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "next_entry" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The local variable "ft" was set to a null pointer despite of an
immediate reassignment.
Thus remove this statement from the beginning of a loop.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Call the function "kfree" at the end only after it was determined
that the local variable "newgenfs" contained a non-null pointer.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "next_entry" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The script "checkpatch.pl" pointed information out like the following.
WARNING: void function return statements are not generally useful
Thus remove such a statement in the affected function.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Multiplications for the size determination of memory allocations
indicated that array data structures should be processed.
Thus use the corresponding function "kcalloc".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The script "checkpatch.pl" pointed information out like the following.
Comparison to NULL could be written !…
Thus fix affected source code places.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Replace the specification of data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The script "checkpatch.pl" pointed information out like the following.
WARNING: void function return statements are not generally useful
Thus remove such a statement in the affected function.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data type by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
For now we have only "=" operator for fowner/uid/euid rules. This
patch provide two more operators - ">" and "<" in order to make
fowner/uid/euid rules more flexible.
Examples of usage.
Appraise all files owned by special and system users (SYS_UID_MAX 999):
appraise fowner<1000
Don't appraise files owned by normal users (UID_MIN 1000):
dont_appraise fowner>999
Appraise all files owned by users with UID 1000-1010:
dont_appraise fowner>1010
appraise fowner>999
Changelog v3:
- Removed code duplication in ima_parse_rule().
- Fix ima_policy_show() - (Mimi)
Changelog v2:
- Fixed default policy rules.
Signed-off-by: Mikhail Kurinnoi <viewizard@viewizard.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
security/integrity/ima/ima_policy.c | 115 +++++++++++++++++++++++++++---------
1 file changed, 87 insertions(+), 28 deletions(-)
KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
uninitialized memory in selinux_socket_bind():
==================================================================
BUG: KMSAN: use of unitialized memory
inter: 0
CPU: 3 PID: 1074 Comm: packet2 Tainted: G B 4.8.0-rc6+ #1916
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
0000000000000000 ffff8800882ffb08 ffffffff825759c8 ffff8800882ffa48
ffffffff818bf551 ffffffff85bab870 0000000000000092 ffffffff85bab550
0000000000000000 0000000000000092 00000000bb0009bb 0000000000000002
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff825759c8>] dump_stack+0x238/0x290 lib/dump_stack.c:51
[<ffffffff818bdee6>] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1008
[<ffffffff818bf0fb>] __msan_warning+0x5b/0xb0 mm/kmsan/kmsan_instr.c:424
[<ffffffff822dae71>] selinux_socket_bind+0xf41/0x1080 security/selinux/hooks.c:4288
[<ffffffff8229357c>] security_socket_bind+0x1ec/0x240 security/security.c:1240
[<ffffffff84265d98>] SYSC_bind+0x358/0x5f0 net/socket.c:1366
[<ffffffff84265a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
[<ffffffff81005678>] do_syscall_64+0x58/0x70 arch/x86/entry/common.c:292
[<ffffffff8518217c>] entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.o:?
chained origin: 00000000ba6009bb
[<ffffffff810bb7a7>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67
[< inline >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322
[< inline >] kmsan_save_stack mm/kmsan/kmsan.c:337
[<ffffffff818bd2b8>] kmsan_internal_chain_origin+0x118/0x1e0 mm/kmsan/kmsan.c:530
[<ffffffff818bf033>] __msan_set_alloca_origin4+0xc3/0x130 mm/kmsan/kmsan_instr.c:380
[<ffffffff84265b69>] SYSC_bind+0x129/0x5f0 net/socket.c:1356
[<ffffffff84265a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
[<ffffffff81005678>] do_syscall_64+0x58/0x70 arch/x86/entry/common.c:292
[<ffffffff8518217c>] return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.o:?
origin description: ----address@SYSC_bind (origin=00000000b8c00900)
==================================================================
(the line numbers are relative to 4.8-rc6, but the bug persists upstream)
, when I run the following program as root:
=======================================================
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
int main(int argc, char *argv[]) {
struct sockaddr addr;
int size = 0;
if (argc > 1) {
size = atoi(argv[1]);
}
memset(&addr, 0, sizeof(addr));
int fd = socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP);
bind(fd, &addr, size);
return 0;
}
=======================================================
(for different values of |size| other error reports are printed).
This happens because bind() unconditionally copies |size| bytes of
|addr| to the kernel, leaving the rest uninitialized. Then
security_socket_bind() reads the IP address bytes, including the
uninitialized ones, to determine the port, or e.g. pass them further to
sel_netnode_find(), which uses them to calculate a hash.
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
[PM: fixed some whitespace damage]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Modifying the attributes of a file makes ima_inode_post_setattr reset
the IMA cache flags. So if the file, which has just been created,
is opened a second time before the first file descriptor is closed,
verification fails since the security.ima xattr has not been written
yet. We therefore have to look at the IMA_NEW_FILE even if the file
already existed.
With this patch there should no longer be an error when cat tries to
open testfile:
$ rm -f testfile
$ ( echo test >&3 ; touch testfile ; cat testfile ) 3>testfile
A file being new is no reason to accept that it is missing a digital
signature demanded by the policy.
Signed-off-by: Daniel Glöckner <dg@emlix.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The default IMA rules are loaded during init and then do not
change, so mark them as __ro_after_init.
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Constify nlmsg permission tables, which are initialized once
and then do not change.
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Mark all of the registration hooks as __ro_after_init (via the
__lsm_ro_after_init macro).
Signed-off-by: James Morris <james.l.morris@oracle.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Kees Cook <keescook@chromium.org>
Subsequent patches will add RO hardening to LSM hooks, however, SELinux
still needs to be able to perform runtime disablement after init to handle
architectures where init-time disablement via boot parameters is not feasible.
Introduce a new kernel configuration parameter CONFIG_SECURITY_WRITABLE_HOOKS,
and a helper macro __lsm_ro_after_init, to handle this case.
Signed-off-by: James Morris <james.l.morris@oracle.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Kees Cook <keescook@chromium.org>
commit 79bcf325e6b32b3c ("prlimit,security,selinux: add a security hook
for prlimit") introduced a security hook for prlimit() and implemented it
for SELinux. However, if prlimit() is called with NULL arguments for both
the new limit and the old limit, then the hook is called with 0 for the
read/write flags, since the prlimit() will neither read nor write the
process' limits. This would in turn lead to calling avc_has_perm() with 0
for the requested permissions, which triggers a BUG_ON() in
avc_has_perm_noaudit() since the kernel should never be invoking
avc_has_perm() with no permissions. Fix this in the SELinux hook by
returning immediately if the flags are 0. Arguably prlimit64() itself
ought to return immediately if both old_rlim and new_rlim are NULL since
it is effectively a no-op in that case.
Reported by the lkp-robot based on trinity testing.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
When SELinux was first added to the kernel, a process could only get
and set its own resource limits via getrlimit(2) and setrlimit(2), so no
MAC checks were required for those operations, and thus no security hooks
were defined for them. Later, SELinux introduced a hook for setlimit(2)
with a check if the hard limit was being changed in order to be able to
rely on the hard limit value as a safe reset point upon context
transitions.
Later on, when prlimit(2) was added to the kernel with the ability to get
or set resource limits (hard or soft) of another process, LSM/SELinux was
not updated other than to pass the target process to the setrlimit hook.
This resulted in incomplete control over both getting and setting the
resource limits of another process.
Add a new security_task_prlimit() hook to the check_prlimit_permission()
function to provide complete mediation. The hook is only called when
acting on another task, and only if the existing DAC/capability checks
would allow access. Pass flags down to the hook to indicate whether the
prlimit(2) call will read, write, or both read and write the resource
limits of the target process.
The existing security_task_setrlimit() hook is left alone; it continues
to serve a purpose in supporting the ability to make decisions based on
the old and/or new resource limit values when setting limits. This
is consistent with the DAC/capability logic, where
check_prlimit_permission() performs generic DAC/capability checks for
acting on another task, while do_prlimit() performs a capability check
based on a comparison of the old and new resource limits. Fix the
inline documentation for the hook to match the code.
Implement the new hook for SELinux. For setting resource limits, we
reuse the existing setrlimit permission. Note that this does overload
the setrlimit permission to mean the ability to set the resource limit
(soft or hard) of another process or the ability to change one's own
hard limit. For getting resource limits, a new getrlimit permission
is defined. This was not originally defined since getrlimit(2) could
only be used to obtain a process' own limits.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Pull sched.h split-up from Ingo Molnar:
"The point of these changes is to significantly reduce the
<linux/sched.h> header footprint, to speed up the kernel build and to
have a cleaner header structure.
After these changes the new <linux/sched.h>'s typical preprocessed
size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
lines), which is around 40% faster to build on typical configs.
Not much changed from the last version (-v2) posted three weeks ago: I
eliminated quirks, backmerged fixes plus I rebased it to an upstream
SHA1 from yesterday that includes most changes queued up in -next plus
all sched.h changes that were pending from Andrew.
I've re-tested the series both on x86 and on cross-arch defconfigs,
and did a bisectability test at a number of random points.
I tried to test as many build configurations as possible, but some
build breakage is probably still left - but it should be mostly
limited to architectures that have no cross-compiler binaries
available on kernel.org, and non-default configurations"
* 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
sched/headers: Clean up <linux/sched.h>
sched/headers: Remove #ifdefs from <linux/sched.h>
sched/headers: Remove the <linux/topology.h> include from <linux/sched.h>
sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h>
sched/headers, x86/apic: Remove the <linux/pm.h> header inclusion from <asm/apic.h>
sched/headers, timers: Remove the <linux/sysctl.h> include from <linux/timer.h>
sched/headers: Remove <linux/magic.h> from <linux/sched/task_stack.h>
sched/headers: Remove <linux/sched.h> from <linux/sched/init.h>
sched/core: Remove unused prefetch_stack()
sched/headers: Remove <linux/rculist.h> from <linux/sched.h>
sched/headers: Remove the 'init_pid_ns' prototype from <linux/sched.h>
sched/headers: Remove <linux/signal.h> from <linux/sched.h>
sched/headers: Remove <linux/rwsem.h> from <linux/sched.h>
sched/headers: Remove the runqueue_is_locked() prototype
sched/headers: Remove <linux/sched.h> from <linux/sched/hotplug.h>
sched/headers: Remove <linux/sched.h> from <linux/sched/debug.h>
sched/headers: Remove <linux/sched.h> from <linux/sched/nohz.h>
sched/headers: Remove <linux/sched.h> from <linux/sched/stat.h>
sched/headers: Remove the <linux/gfp.h> include from <linux/sched.h>
sched/headers: Remove <linux/rtmutex.h> from <linux/sched.h>
...
Update files that depend on the magic.h inclusion.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We don't actually need the full rculist.h header in sched.h anymore,
we will be able to include the smaller rcupdate.h header instead.
But first update code that relied on the implicit header inclusion.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h
doing that for them.
Note that even if the count where we need to add extra headers seems high,
it's still a net win, because <linux/sched.h> is included in over
2,200 files ...
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/user.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/user.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
commit 1ea0ce4069 ("selinux: allow
changing labels for cgroupfs") broke the Android init program,
which looks up security contexts whenever creating directories
and attempts to assign them via setfscreatecon().
When creating subdirectories in cgroup mounts, this would previously
be ignored since cgroup did not support userspace setting of security
contexts. However, after the commit, SELinux would attempt to honor
the requested context on cgroup directories and fail due to permission
denial. Avoid breaking existing userspace/policy by wrapping this change
with a conditional on a new cgroup_seclabel policy capability. This
preserves existing behavior until/unless a new policy explicitly enables
this capability.
Reported-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
rcu_dereference_key() and user_key_payload() are currently being used in
two different, incompatible ways:
(1) As a wrapper to rcu_dereference() - when only the RCU read lock used
to protect the key.
(2) As a wrapper to rcu_dereference_protected() - when the key semaphor is
used to protect the key and the may be being modified.
Fix this by splitting both of the key wrappers to produce:
(1) RCU accessors for keys when caller has the key semaphore locked:
dereference_key_locked()
user_key_payload_locked()
(2) RCU accessors for keys when caller holds the RCU read lock:
dereference_key_rcu()
user_key_payload_rcu()
This should fix following warning in the NFS idmapper
===============================
[ INFO: suspicious RCU usage. ]
4.10.0 #1 Tainted: G W
-------------------------------
./include/keys/user-type.h:53 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 0
1 lock held by mount.nfs/5987:
#0: (rcu_read_lock){......}, at: [<d000000002527abc>] nfs_idmap_get_key+0x15c/0x420 [nfsv4]
stack backtrace:
CPU: 1 PID: 5987 Comm: mount.nfs Tainted: G W 4.10.0 #1
Call Trace:
dump_stack+0xe8/0x154 (unreliable)
lockdep_rcu_suspicious+0x140/0x190
nfs_idmap_get_key+0x380/0x420 [nfsv4]
nfs_map_name_to_uid+0x2a0/0x3b0 [nfsv4]
decode_getfattr_attrs+0xfac/0x16b0 [nfsv4]
decode_getfattr_generic.constprop.106+0xbc/0x150 [nfsv4]
nfs4_xdr_dec_lookup_root+0xac/0xb0 [nfsv4]
rpcauth_unwrap_resp+0xe8/0x140 [sunrpc]
call_decode+0x29c/0x910 [sunrpc]
__rpc_execute+0x140/0x8f0 [sunrpc]
rpc_run_task+0x170/0x200 [sunrpc]
nfs4_call_sync_sequence+0x68/0xa0 [nfsv4]
_nfs4_lookup_root.isra.44+0xd0/0xf0 [nfsv4]
nfs4_lookup_root+0xe0/0x350 [nfsv4]
nfs4_lookup_root_sec+0x70/0xa0 [nfsv4]
nfs4_find_root_sec+0xc4/0x100 [nfsv4]
nfs4_proc_get_rootfh+0x5c/0xf0 [nfsv4]
nfs4_get_rootfh+0x6c/0x190 [nfsv4]
nfs4_server_common_setup+0xc4/0x260 [nfsv4]
nfs4_create_server+0x278/0x3c0 [nfsv4]
nfs4_remote_mount+0x50/0xb0 [nfsv4]
mount_fs+0x74/0x210
vfs_kern_mount+0x78/0x220
nfs_do_root_mount+0xb0/0x140 [nfsv4]
nfs4_try_mount+0x60/0x100 [nfsv4]
nfs_fs_mount+0x5ec/0xda0 [nfs]
mount_fs+0x74/0x210
vfs_kern_mount+0x78/0x220
do_mount+0x254/0xf70
SyS_mount+0x94/0x100
system_call+0x38/0xe0
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Now that %z is standartised in C99 there is no reason to support %Z.
Unlike %L it doesn't even make format strings smaller.
Use BUILD_BUG_ON in a couple ATM drivers.
In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
is in my opinion is quite an achievement. Hopefully this patch inspires
someone else to trim vsprintf.c more.
Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.
Remove the vma parameter to simplify things.
[arnd@arndb.de: fix ARM build]
Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull namespace updates from Eric Biederman:
"There is a lot here. A lot of these changes result in subtle user
visible differences in kernel behavior. I don't expect anything will
care but I will revert/fix things immediately if any regressions show
up.
From Seth Forshee there is a continuation of the work to make the vfs
ready for unpriviled mounts. We had thought the previous changes
prevented the creation of files outside of s_user_ns of a filesystem,
but it turns we missed the O_CREAT path. Ooops.
Pavel Tikhomirov and Oleg Nesterov worked together to fix a long
standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only
children that are forked after the prctl are considered and not
children forked before the prctl. The only known user of this prctl
systemd forks all children after the prctl. So no userspace
regressions will occur. Holding earlier forked children to the same
rules as later forked children creates a semantic that is sane enough
to allow checkpoing of processes that use this feature.
There is a long delayed change by Nikolay Borisov to limit inotify
instances inside a user namespace.
Michael Kerrisk extends the API for files used to maniuplate
namespaces with two new trivial ioctls to allow discovery of the
hierachy and properties of namespaces.
Konstantin Khlebnikov with the help of Al Viro adds code that when a
network namespace exits purges it's sysctl entries from the dcache. As
in some circumstances this could use a lot of memory.
Vivek Goyal fixed a bug with stacked filesystems where the permissions
on the wrong inode were being checked.
I continue previous work on ptracing across exec. Allowing a file to
be setuid across exec while being ptraced if the tracer has enough
credentials in the user namespace, and if the process has CAP_SETUID
in it's own namespace. Proc files for setuid or otherwise undumpable
executables are now owned by the root in the user namespace of their
mm. Allowing debugging of setuid applications in containers to work
better.
A bug I introduced with permission checking and automount is now
fixed. The big change is to mark the mounts that the kernel initiates
as a result of an automount. This allows the permission checks in sget
to be safely suppressed for this kind of mount. As the permission
check happened when the original filesystem was mounted.
Finally a special case in the mount namespace is removed preventing
unbounded chains in the mount hash table, and making the semantics
simpler which benefits CRIU.
The vfs fix along with related work in ima and evm I believe makes us
ready to finish developing and merge fully unprivileged mounts of the
fuse filesystem. The cleanups of the mount namespace makes discussing
how to fix the worst case complexity of umount. The stacked filesystem
fixes pave the way for adding multiple mappings for the filesystem
uids so that efficient and safer containers can be implemented"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc/sysctl: Don't grab i_lock under sysctl_lock.
vfs: Use upper filesystem inode in bprm_fill_uid()
proc/sysctl: prune stale dentries during unregistering
mnt: Tuck mounts under others instead of creating shadow/side mounts.
prctl: propagate has_child_subreaper flag to every descendant
introduce the walk_process_tree() helper
nsfs: Add an ioctl() to return owner UID of a userns
fs: Better permission checking for submounts
exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction
vfs: open() with O_CREAT should not create inodes with unknown ids
nsfs: Add an ioctl() to return the namespace type
proc: Better ownership of files for non-dumpable tasks in user namespaces
exec: Remove LSM_UNSAFE_PTRACE_CAP
exec: Test the ptracer's saved cred to see if the tracee can gain caps
exec: Don't reset euid and egid when the tracee has CAP_SETUID
inotify: Convert to using per-namespace limits
Here is the "small" driver core patches for 4.11-rc1.
Not much here, some firmware documentation and self-test updates, a
debugfs code formatting issue, and a new feature for call_usermodehelper
to make it more robust on systems that want to lock it down in a more
secure way.
All of these have been linux-next for a while now with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWK2jKg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymCEACgozYuqZZ/TUGW0P3xVNi7fbfUWCEAn3nYExrc
XgevqeYOSKp2We6X/2JX
=aZ+5
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the "small" driver core patches for 4.11-rc1.
Not much here, some firmware documentation and self-test updates, a
debugfs code formatting issue, and a new feature for call_usermodehelper
to make it more robust on systems that want to lock it down in a more
secure way.
All of these have been linux-next for a while now with no reported
issues"
* tag 'driver-core-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kernfs: handle null pointers while printing node name and path
Introduce STATIC_USERMODEHELPER to mediate call_usermodehelper()
Make static usermode helper binaries constant
kmod: make usermodehelper path a const string
firmware: revamp firmware documentation
selftests: firmware: send expected errors to /dev/null
selftests: firmware: only modprobe if driver is missing
platform: Print the resource range if device failed to claim
kref: prefer atomic_inc_not_zero to atomic_add_unless
debugfs: improve formatting of debugfs_real_fops()
Pull networking updates from David Miller:
"Highlights:
1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
Varadhan.
2) Simplify classifier state on sk_buff in order to shrink it a bit.
From Willem de Bruijn.
3) Introduce SIPHASH and it's usage for secure sequence numbers and
syncookies. From Jason A. Donenfeld.
4) Reduce CPU usage for ICMP replies we are going to limit or
suppress, from Jesper Dangaard Brouer.
5) Introduce Shared Memory Communications socket layer, from Ursula
Braun.
6) Add RACK loss detection and allow it to actually trigger fast
recovery instead of just assisting after other algorithms have
triggered it. From Yuchung Cheng.
7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.
8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.
9) Export MPLS packet stats via netlink, from Robert Shearman.
10) Significantly improve inet port bind conflict handling, especially
when an application is restarted and changes it's setting of
reuseport. From Josef Bacik.
11) Implement TX batching in vhost_net, from Jason Wang.
12) Extend the dummy device so that VF (virtual function) features,
such as configuration, can be more easily tested. From Phil
Sutter.
13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
Dumazet.
14) Add new bpf MAP, implementing a longest prefix match trie. From
Daniel Mack.
15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.
16) Add new aquantia driver, from David VomLehn.
17) Add bpf tracepoints, from Daniel Borkmann.
18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
Florian Fainelli.
19) Remove custom busy polling in many drivers, it is done in the core
networking since 4.5 times. From Eric Dumazet.
20) Support XDP adjust_head in virtio_net, from John Fastabend.
21) Fix several major holes in neighbour entry confirmation, from
Julian Anastasov.
22) Add XDP support to bnxt_en driver, from Michael Chan.
23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.
24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.
25) Support GRO in IPSEC protocols, from Steffen Klassert"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
Revert "ath10k: Search SMBIOS for OEM board file extension"
net: socket: fix recvmmsg not returning error from sock_error
bnxt_en: use eth_hw_addr_random()
bpf: fix unlocking of jited image when module ronx not set
arch: add ARCH_HAS_SET_MEMORY config
net: napi_watchdog() can use napi_schedule_irqoff()
tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
net/hsr: use eth_hw_addr_random()
net: mvpp2: enable building on 64-bit platforms
net: mvpp2: switch to build_skb() in the RX path
net: mvpp2: simplify MVPP2_PRS_RI_* definitions
net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
net: mvpp2: remove unused register definitions
net: mvpp2: simplify mvpp2_bm_bufs_add()
net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
net: mvpp2: release reference to txq_cpu[] entry after unmapping
net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
...
Pull security layer updates from James Morris:
"Highlights:
- major AppArmor update: policy namespaces & lots of fixes
- add /sys/kernel/security/lsm node for easy detection of loaded LSMs
- SELinux cgroupfs labeling support
- SELinux context mounts on tmpfs, ramfs, devpts within user
namespaces
- improved TPM 2.0 support"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (117 commits)
tpm: declare tpm2_get_pcr_allocation() as static
tpm: Fix expected number of response bytes of TPM1.2 PCR Extend
tpm xen: drop unneeded chip variable
tpm: fix misspelled "facilitate" in module parameter description
tpm_tis: fix the error handling of init_tis()
KEYS: Use memzero_explicit() for secret data
KEYS: Fix an error code in request_master_key()
sign-file: fix build error in sign-file.c with libressl
selinux: allow changing labels for cgroupfs
selinux: fix off-by-one in setprocattr
tpm: silence an array overflow warning
tpm: fix the type of owned field in cap_t
tpm: add securityfs support for TPM 2.0 firmware event log
tpm: enhance read_log_of() to support Physical TPM event log
tpm: enhance TPM 2.0 PCR extend to support multiple banks
tpm: implement TPM 2.0 capability to get active PCR banks
tpm: fix RC value check in tpm2_seal_trusted
tpm_tis: fix iTPM probe via probe_itpm() function
tpm: Begin the process to deprecate user_read_timer
tpm: remove tpm_read_index and tpm_write_index from tpm.h
...
Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:
- Implement wraparound-safe refcount_t and kref_t types based on
generic atomic primitives (Peter Zijlstra)
- Improve and fix the ww_mutex code (Nicolai Hähnle)
- Add self-tests to the ww_mutex code (Chris Wilson)
- Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
Bueso)
- Micro-optimize the current-task logic all around the core kernel
(Davidlohr Bueso)
- Tidy up after recent optimizations: remove stale code and APIs,
clean up the code (Waiman Long)
- ... plus misc fixes, updates and cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
fork: Fix task_struct alignment
locking/spinlock/debug: Remove spinlock lockup detection code
lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
lkdtm: Convert to refcount_t testing
kref: Implement 'struct kref' using refcount_t
refcount_t: Introduce a special purpose refcount type
sched/wake_q: Clarify queue reinit comment
sched/wait, rcuwait: Fix typo in comment
locking/mutex: Fix lockdep_assert_held() fail
locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
locking/rwsem: Reinit wake_q after use
locking/rwsem: Remove unnecessary atomic_long_t casts
jump_labels: Move header guard #endif down where it belongs
locking/atomic, kref: Implement kref_put_lock()
locking/ww_mutex: Turn off __must_check for now
locking/atomic, kref: Avoid more abuse
locking/atomic, kref: Use kref_get_unless_zero() more
locking/atomic, kref: Kill kref_sub()
locking/atomic, kref: Add kref_read()
locking/atomic, kref: Add KREF_INIT()
...
I don't think GCC has figured out how to optimize the memset() away, but
they might eventually so let's future proof this code a bit.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
This function has two callers and neither are able to handle a NULL
return. Really, -EINVAL is the correct thing return here anyway. This
fixes some static checker warnings like:
security/keys/encrypted-keys/encrypted.c:709 encrypted_key_decrypt()
error: uninitialized symbol 'master_key'.
Fixes: 7e70cb4978 ("keys: add new key-type encrypted")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
SELinux tries to support setting/clearing of /proc/pid/attr attributes
from the shell by ignoring terminating newlines and treating an
attribute value that begins with a NUL or newline as an attempt to
clear the attribute. However, the test for clearing attributes has
always been wrong; it has an off-by-one error, and this could further
lead to reading past the end of the allocated buffer since commit
bb646cdb12 ("proc_pid_attr_write():
switch to memdup_user()"). Fix the off-by-one error.
Even with this fix, setting and clearing /proc/pid/attr attributes
from the shell is not straightforward since the interface does not
support multiple write() calls (so shells that write the value and
newline separately will set and then immediately clear the attribute,
requiring use of echo -n to set the attribute), whereas trying to use
echo -n "" to clear the attribute causes the shell to skip the
write() call altogether since POSIX says that a zero-length write
causes no side effects. Thus, one must use echo -n to set and echo
without -n to clear, as in the following example:
$ echo -n unconfined_u:object_r:user_home_t:s0 > /proc/$$/attr/fscreate
$ cat /proc/$$/attr/fscreate
unconfined_u:object_r:user_home_t:s0
$ echo "" > /proc/$$/attr/fscreate
$ cat /proc/$$/attr/fscreate
Note the use of /proc/$$ rather than /proc/self, as otherwise
the cat command will read its own attribute value, not that of the shell.
There are no users of this facility to my knowledge; possibly we
should just get rid of it.
UPDATE: Upon further investigation it appears that a local process
with the process:setfscreate permission can cause a kernel panic as a
result of this bug. This patch fixes CVE-2017-2618.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: added the update about CVE-2017-2618 to the commit description]
Cc: stable@vger.kernel.org # 3.5: d6ea83ec68
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
This patch allows changing labels for cgroup mounts. Previously, running
chcon on cgroupfs would throw an "Operation not supported". This patch
specifically whitelist cgroupfs.
The patch could also allow containers to write only to the systemd cgroup
for instance, while the other cgroups are kept with cgroup_t label.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>