Recognizing that the audit context is an internal audit value, use an
access function to retrieve the audit context pointer for the task
rather than reaching directly into the task struct to get it.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: merge fuzz in auditsc.c and selinuxfs.c, checkpatch.pl fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Wrap the AVC state within the selinux_state structure and
pass it explicitly to all AVC functions. The AVC private state
is encapsulated in a selinux_avc structure that is referenced
from the selinux_state.
This change should have no effect on SELinux behavior or
APIs (userspace or LSM).
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: James Morris <james.morris@microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
If security_get_bools/classes are called before the selinux state is
initialized (i.e. before first policy load), then they should just
return immediately with no booleans/classes.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Define a selinux state structure (struct selinux_state) for
global SELinux state and pass it explicitly to all security server
functions. The public portion of the structure contains state
that is used throughout the SELinux code, such as the enforcing mode.
The structure also contains a pointer to a selinux_ss structure whose
definition is private to the security server and contains security
server specific state such as the policy database and SID table.
This change should have no effect on SELinux behavior or APIs
(userspace or LSM). It merely wraps SELinux state and passes it
explicitly as needed.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: minor fixups needed due to collisions with the SCTP patches]
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAlpwp5QUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQVeRaWujKfIoWAxAAj4Ne8MkQj7AvwlXN/wc4F1jlLGLq
VfIN1CBPqzjOf8foHY05lYOyqO/npT8cxFtjaI/zBt44Mw9DnCpUG1/XiMZqNK31
Zg+cOKHNyshsP9g8muH4r3NylzXLA0k/K+GvYXqibvSMlkQhygIjMgktWquM/Nkk
rKOwB9T20XhGOuusfNuNMWFef15Jda6BjPlntoGvx/EebSizvy8f7M/AC+BNfcgO
1S26WEEinxc7EaihRqU6epCYZFK10M/WrDq5DGPq5Gw2JLQW4ZzLgjpRr/Yh9gJ5
sBc6Kok2qPeIh206OQeq/KqCAwAKn8+9PDiAoemYIGJ5dD63YjY8RyTlGoMchedp
geEuzz8b84qcrylziUd4TG0TkJ4Rdj14FLRLrNv50iw/+Hl3NzQiJBY+9SKULJAb
d05BHCriomJV/uT9N7OusAE9GTd8jBaz6fL8h0dOSbPrXkzjEXH6G8qsziw26M7s
jdtRoGmsnQ/h5RiaYEoMBC8C8jWn1MozEfW2K+P8Nzgp/JTUodFdzz+ZRSPQZNZ6
4qd8vYaxl7x3UeMBbcPTGeDvFGBOGr98RdWQfjyT6KEF4KLRtIQ36PeQEwXc2Vq5
W9x5STQ7RycyKp69cnz3qoEOdhn7XMzYHPz6jld6b3lcHP+VysDmWA7/Din50RrY
hgUzstNb5Kr3np4=
=ch+z
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20180130' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"A small pull request this time, just three patches, and one of these
is just a comment update (swap the FSF physical address for a URL).
The other two patches are small bug fixes found by szybot/syzkaller;
they individual patch descriptions should tell you all you ever wanted
to know"
* tag 'selinux-pr-20180130' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: skip bounded transition processing if the policy isn't loaded
selinux: ensure the context is NUL terminated in security_context_to_sid_core()
security: replace FSF address with web source in license notices
We can't do anything reasonable in security_bounded_transition() if we
don't have a policy loaded, and in fact we could run into problems
with some of the code inside expecting a policy. Fix these problems
like we do many others in security/selinux/ss/services.c by checking
to see if the policy is loaded (ss_initialized) and returning quickly
if it isn't.
Reported-by: syzbot <syzkaller-bugs@googlegroups.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: James Morris <james.l.morris@oracle.com>
The syzbot/syzkaller automated tests found a problem in
security_context_to_sid_core() during early boot (before we load the
SELinux policy) where we could potentially feed context strings without
NUL terminators into the strcmp() function.
We already guard against this during normal operation (after the SELinux
policy has been loaded) by making a copy of the context strings and
explicitly adding a NUL terminator to the end. The patch extends this
protection to the early boot case (no loaded policy) by moving the context
copy earlier in security_context_to_sid_core().
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Reviewed-By: William Roberts <william.c.roberts@intel.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAloJ+XwUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQVeRaWujKfIqv3w//aNbHxEvf59yf9TjdrmJE6ivFlTAL
RmYCMsFn7uEisolTX1LPnz3cVNqN2/GQ5cnfcnrMiw7d2E/k85jq6Ket6NysX0Wi
LCj6V/JTDeibB41GPDbiC9pSbIER5kUaqXI+X2aR4PgGumxxjdIqmammd1Sf1Hy9
470OJjDnhH68KFLG4bxVPY8Y4j4i/xgIKHU9z9EUA5LErhDAajWADSbvLcTZcp4b
eja/DeSb8zbvHloB/maoqI9mnSKnwuGd91nz6cJBb92Lhy2A6xXNMCIVY/9tsb9n
ZOw8NvFbfpOZ6WUZgwQCjdyn4nV0TuJQPpXNcjVoD9djczTnBq5EXz0FpHcM7d7n
d44DeHOMJmJ2vGIbUz9MpelAxqckhY9wh/XTi1Kszr6qR8kSfzSnDsk1/bOWHWdk
2dKz6MAr4GbN6vgWRTtuuZ7db8TqFa1KdLVicKC0okaYlc0dH5PWC3KahKHbjgGi
5COdBhFexkeL82kJtMrFbusMNetBrYoLO4qOoSThuEOoCEGh0Fgx5zXlLFlEm3uv
hLtEdxT+FLO3jFKCVejLoIHwl/YJ+Pd8C2rAkaXV8AEvs2Cn1gni4lb150nXtq5+
BhkrkjvthYTXuQZH+yMsoTVFVa0QVm2U+QgLmf19MT1EUqc46xIczYUAInV2VxlY
2WQ4d6mcD+PMzoQ=
=VtxX
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux updates from Paul Moore:
"Seven SELinux patches for v4.15, although five of the seven are small
build fixes and cleanups.
Of the remaining two patches, the only one worth really calling out is
Eric's fix for the SELinux filesystem xattr set/remove code; the other
patch simply converts the SELinux hash table implementation to use
kmem_cache.
Eric's setxattr/removexattr tweak converts SELinux back to calling the
commoncap implementations when the xattr is not SELinux related. The
immediate win is to fixup filesystem capabilities in user namespaces,
but it makes things a bit saner overall; more information in the
commit description"
* tag 'selinux-pr-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: remove extraneous initialization of slots_used and max_chain_len
selinux: remove redundant assignment to len
selinux: remove redundant assignment to str
selinux: fix build warning
selinux: fix build warning by removing the unused sid variable
selinux: Perform both commoncap and selinux xattr checks
selinux: Use kmem_cache for hashtab_node
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Variables slots_used and max_chain_len are being initialized to zero
twice. Remove the second set of initializations in the for loop.
Cleans up the clang warnings:
Value stored to 'slots_used' is never read
Value stored to 'max_chain_len' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The variable len is being set to zero and this value is never
being read since len is being set to a different value just
a few lines later. Remove this redundant assignment. Cleans
up clang warning: Value stored to 'len' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
During random test as own device to check slub account,
we found some slack memory from hashtab_node(kmalloc-64).
By using kzalloc(), middle of test result like below:
allocated size 240768
request size 45144
slack size 195624
allocation count 3762
So, we want to use kmem_cache_zalloc() and that
can reduce memory size 52byte(slack size/alloc count) per each struct.
Signed-off-by: Kyeongdon Kim <kyeongdon.kim@lge.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Update my email address since epoch.ncsc.mil no longer exists.
MAINTAINERS and CREDITS are already correct.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
As systemd ramps up enabling NNP (NoNewPrivileges) for system services,
it is increasingly breaking SELinux domain transitions for those services
and their descendants. systemd enables NNP not only for services whose
unit files explicitly specify NoNewPrivileges=yes but also for services
whose unit files specify any of the following options in combination with
running without CAP_SYS_ADMIN (e.g. specifying User= or a
CapabilityBoundingSet= without CAP_SYS_ADMIN): SystemCallFilter=,
SystemCallArchitectures=, RestrictAddressFamilies=, RestrictNamespaces=,
PrivateDevices=, ProtectKernelTunables=, ProtectKernelModules=,
MemoryDenyWriteExecute=, or RestrictRealtime= as per the systemd.exec(5)
man page.
The end result is bad for the security of both SELinux-disabled and
SELinux-enabled systems. Packagers have to turn off these
options in the unit files to preserve SELinux domain transitions. For
users who choose to disable SELinux, this means that they miss out on
at least having the systemd-supported protections. For users who keep
SELinux enabled, they may still be missing out on some protections
because it isn't necessarily guaranteed that the SELinux policy for
that service provides the same protections in all cases.
commit 7b0d0b40cd ("selinux: Permit bounded transitions under
NO_NEW_PRIVS or NOSUID.") allowed bounded transitions under NNP in
order to support limited usage for sandboxing programs. However,
defining typebounds for all of the affected service domains
is impractical to implement in policy, since typebounds requires us
to ensure that each domain is allowed everything all of its descendant
domains are allowed, and this has to be repeated for the entire chain
of domain transitions. There is no way to clone all allow rules from
descendants to their ancestors in policy currently, and doing so would
be undesirable even if it were practical, as it requires leaking
permissions to objects and operations into ancestor domains that could
weaken their own security in order to allow them to the descendants
(e.g. if a descendant requires execmem permission, then so do all of
its ancestors; if a descendant requires execute permission to a file,
then so do all of its ancestors; if a descendant requires read to a
symbolic link or temporary file, then so do all of its ancestors...).
SELinux domains are intentionally not hierarchical / bounded in this
manner normally, and making them so would undermine their protections
and least privilege.
We have long had a similar tension with SELinux transitions and nosuid
mounts, albeit not as severe. Users often have had to choose between
retaining nosuid on a mount and allowing SELinux domain transitions on
files within those mounts. This likewise leads to unfortunate tradeoffs
in security.
Decouple NNP/nosuid from SELinux transitions, so that we don't have to
make a choice between them. Introduce a nnp_nosuid_transition policy
capability that enables transitions under NNP/nosuid to be based on
a permission (nnp_transition for NNP; nosuid_transition for nosuid)
between the old and new contexts in addition to the current support
for bounded transitions. Domain transitions can then be allowed in
policy without requiring the parent to be a strict superset of all of
its children.
With this change, systemd unit files can be left unmodified from upstream.
SELinux-disabled and SELinux-enabled users will benefit from retaining any
of the systemd-provided protections. SELinux policy will only need to
be adapted to enable the new policy capability and to allow the
new permissions between domain pairs as appropriate.
NB: Allowing nnp_transition between two contexts opens up the potential
for the old context to subvert the new context by installing seccomp
filters before the execve. Allowing nosuid_transition between two contexts
opens up the potential for a context transition to occur on a file from
an untrusted filesystem (e.g. removable media or remote filesystem). Use
with care.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The allocated size for each ebitmap_node is 192byte by kzalloc().
Then, ebitmap_node size is fixed, so it's possible to use only 144byte
for each object by kmem_cache_zalloc().
It can reduce some dynamic allocation size.
Signed-off-by: Junil Lee <junil0814.lee@lge.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add a type for Infiniband ports and an access vector for subnet
management packets. Implement the ib_port_smp hook to check that the
caller has permission to send and receive SMPs on the end port specified
by the device name and port. Add interface to query the SID for a IB
port, which walks the IB_PORT ocontexts to find an entry for the
given name and port.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add a type and access vector for PKeys. Implement the ib_pkey_access
hook to check that the caller has permission to access the PKey on the
given subnet prefix. Add an interface to get the PKey SID. Walk the PKey
ocontexts to find an entry for the given subnet prefix and pkey.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Support for Infiniband requires the addition of two new object contexts,
one for infiniband PKeys and another IB Ports. Added handlers to read
and write the new ocontext types when reading or writing a binary policy
representation.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Log the state of SELinux policy capabilities when a policy is loaded.
For each policy capability known to the kernel, log the policy capability
name and the value set in the policy. For policy capabilities that are
set in the loaded policy but unknown to the kernel, log the policy
capability index, since this is the only information presently available
in the policy.
Sample output with a policy created with a new capability defined
that is not known to the kernel:
SELinux: policy capability network_peer_controls=1
SELinux: policy capability open_perms=1
SELinux: policy capability extended_socket_class=1
SELinux: policy capability always_check_network=0
SELinux: policy capability cgroup_seclabel=0
SELinux: unknown policy capability 5
Resolves: https://github.com/SELinuxProject/selinux-kernel/issues/32
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
* Return an error code without storing it in an intermediate variable.
* Delete the local variable "rc" and the jump label "out" which became
unnecessary with this refactoring.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Replace five goto statements (and previous variable assignments) by
direct returns after a memory allocation failure in this function.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
We removed this initialization as a cleanup but it is probably required.
The concern is that "nel" can be zero. I'm not an expert on SELinux
code but I think it looks possible to write an SELinux policy which
triggers this bug. GCC doesn't catch this, but my static checker does.
Fixes: 9c312e79d6 ("selinux: Delete an unnecessary variable initialisation in range_read()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
'perms' will never be NULL since it isn't a plain pointer but an array
of u32 values.
This fixes the following warning when building with clang:
security/selinux/ss/services.c:158:16: error: address of array
'p_in->perms' will always evaluate to 'true'
[-Werror,-Wpointer-bool-conversion]
while (p_in->perms && p_in->perms[k]) {
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The script "checkpatch.pl" pointed information out like the following.
Comparison to NULL could be written !…
Thus fix affected source code places.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Replace the specification of a data type by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "kzalloc" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The local variable "rt" will be set to an appropriate pointer a bit later.
Thus omit the explicit initialisation at the beginning which became
unnecessary with a previous update step.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "next_entry" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The local variable "ft" was set to a null pointer despite of an
immediate reassignment.
Thus remove this statement from the beginning of a loop.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Call the function "kfree" at the end only after it was determined
that the local variable "newgenfs" contained a non-null pointer.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return directly after a call of the function "next_entry" failed
at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The script "checkpatch.pl" pointed information out like the following.
WARNING: void function return statements are not generally useful
Thus remove such a statement in the affected function.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Multiplications for the size determination of memory allocations
indicated that array data structures should be processed.
Thus use the corresponding function "kcalloc".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The script "checkpatch.pl" pointed information out like the following.
Comparison to NULL could be written !…
Thus fix affected source code places.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Replace the specification of data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The script "checkpatch.pl" pointed information out like the following.
WARNING: void function return statements are not generally useful
Thus remove such a statement in the affected function.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data type by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
commit 1ea0ce4069 ("selinux: allow
changing labels for cgroupfs") broke the Android init program,
which looks up security contexts whenever creating directories
and attempts to assign them via setfscreatecon().
When creating subdirectories in cgroup mounts, this would previously
be ignored since cgroup did not support userspace setting of security
contexts. However, after the commit, SELinux would attempt to honor
the requested context on cgroup directories and fail due to permission
denial. Avoid breaking existing userspace/policy by wrapping this change
with a conditional on a new cgroup_seclabel policy capability. This
preserves existing behavior until/unless a new policy explicitly enables
this capability.
Reported-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Now that %z is standartised in C99 there is no reason to support %Z.
Unlike %L it doesn't even make format strings smaller.
Use BUILD_BUG_ON in a couple ATM drivers.
In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
is in my opinion is quite an achievement. Hopefully this patch inspires
someone else to trim vsprintf.c more.
Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Extend SELinux to support distinctions among all network address families
implemented by the kernel by defining new socket security classes
and mapping to them. Otherwise, many sockets are mapped to the generic
socket class and are indistinguishable in policy. This has come up
previously with regard to selectively allowing access to bluetooth sockets,
and more recently with regard to selectively allowing access to AF_ALG
sockets. Guido Trentalancia submitted a patch that took a similar approach
to add only support for distinguishing AF_ALG sockets, but this generalizes
his approach to handle all address families implemented by the kernel.
Socket security classes are also added for ICMP and SCTP sockets.
Socket security classes were not defined for AF_* values that are reserved
but unimplemented in the kernel, e.g. AF_NETBEUI, AF_SECURITY, AF_ASH,
AF_ECONET, AF_SNA, AF_WANPIPE.
Backward compatibility is provided by only enabling the finer-grained
socket classes if a new policy capability is set in the policy; older
policies will behave as before. The legacy redhat1 policy capability
that was only ever used in testing within Fedora for ptrace_child
is reclaimed for this purpose; as far as I can tell, this policy
capability is not enabled in any supported distro policy.
Add a pair of conditional compilation guards to detect when new AF_* values
are added so that we can update SELinux accordingly rather than having to
belatedly update it long after new address families are introduced.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Merge my system logging cleanups, triggered by the broken '\n' patches.
The line continuation handling has been broken basically forever, and
the code to handle the system log records was both confusing and
dubious. And it would do entirely the wrong thing unless you always had
a terminating newline, partly because it couldn't actually see whether a
message was marked KERN_CONT or not (but partly because the LOG_CONT
handling in the recording code was rather confusing too).
This re-introduces a real semantically meaningful KERN_CONT, and fixes
the few places I noticed where it was missing. There are probably more
missing cases, since KERN_CONT hasn't actually had any semantic meaning
for at least four years (other than the checkpatch meaning of "no log
level necessary, this is a continuation line").
This also allows the combination of KERN_CONT and a log level. In that
case the log level will be ignored if the merging with a previous line
is successful, but if a new record is needed, that new record will now
get the right log level.
That also means that you can at least in theory combine KERN_CONT with
the "pr_info()" style helpers, although any use of pr_fmt() prefixing
would make that just result in a mess, of course (the prefix would end
up in the middle of a continuing line).
* printk-cleanups:
printk: make reading the kernel log flush pending lines
printk: re-organize log_output() to be more legible
printk: split out core logging code into helper function
printk: reinstate KERN_CONT for printing continuation lines
Long long ago the kernel log buffer was a buffered stream of bytes, very
much like stdio in user space. It supported log levels by scanning the
stream and noticing the log level markers at the beginning of each line,
but if you wanted to print a partial line in multiple chunks, you just
did multiple printk() calls, and it just automatically worked.
Except when it didn't, and you had very confusing output when different
lines got all mixed up with each other. Then you got fragment lines
mixing with each other, or with non-fragment lines, because it was
traditionally impossible to tell whether a printk() call was a
continuation or not.
To at least help clarify the issue of continuation lines, we added a
KERN_CONT marker back in 2007 to mark continuation lines:
4749252776 ("printk: add KERN_CONT annotation").
That continuation marker was initially an empty string, and didn't
actuall make any semantic difference. But it at least made it possible
to annotate the source code, and have check-patch notice that a printk()
didn't need or want a log level marker, because it was a continuation of
a previous line.
To avoid the ambiguity between a continuation line that had that
KERN_CONT marker, and a printk with no level information at all, we then
in 2009 made KERN_CONT be a real log level marker which meant that we
could now reliably tell the difference between the two cases.
5fd29d6ccb ("printk: clean up handling of log-levels and newlines")
and we could take advantage of that to make sure we didn't mix up
continuation lines with lines that just didn't have any loglevel at all.
Then, in 2012, the kernel log buffer was changed to be a "record" based
log, where each line was a record that has a loglevel and a timestamp.
You can see the beginning of that conversion in commits
e11fea92e1 ("kmsg: export printk records to the /dev/kmsg interface")
7ff9554bb5 ("printk: convert byte-buffer to variable-length record buffer")
with a number of follow-up commits to fix some painful fallout from that
conversion. Over all, it took a couple of months to sort out most of
it. But the upside was that you could have concurrent readers (and
writers) of the kernel log and not have lines with mixed output in them.
And one particular pain-point for the record-based kernel logging was
exactly the fragmentary lines that are generated in smaller chunks. In
order to still log them as one recrod, the continuation lines need to be
attached to the previous record properly.
However the explicit continuation record marker that is actually useful
for this exact case was actually removed in aroundm the same time by commit
61e99ab8e3 ("printk: remove the now unnecessary "C" annotation for KERN_CONT")
due to the incorrect belief that KERN_CONT wasn't meaningful. The
ambiguity between "is this a continuation line" or "is this a plain
printk with no log level information" was reintroduced, and in fact
became an even bigger pain point because there was now the whole
record-level merging of kernel messages going on.
This patch reinstates the KERN_CONT as a real non-empty string marker,
so that the ambiguity is fixed once again.
But it's not a plain revert of that original removal: in the four years
since we made KERN_CONT an empty string again, not only has the format
of the log level markers changed, we've also had some usage changes in
this area.
For example, some ACPI code seems to use KERN_CONT _together_ with a log
level, and now uses both the KERN_CONT marker and (for example) a
KERN_INFO marker to show that it's an informational continuation of a
line.
Which is actually not a bad idea - if the continuation line cannot be
attached to its predecessor, without the log level information we don't
know what log level to assign to it (and we traditionally just assigned
it the default loglevel). So having both a log level and the KERN_CONT
marker is not necessarily a bad idea, but it does mean that we need to
actually iterate over potentially multiple markers, rather than just a
single one.
Also, since KERN_CONT was still conceptually needed, and encouraged, but
didn't actually _do_ anything, we've also had the reverse problem:
rather than having too many annotations it has too few, and there is bit
rot with code that no longer marks the continuation lines with the
KERN_CONT marker.
So this patch not only re-instates the non-empty KERN_CONT marker, it
also fixes up the cases of bit-rot I noticed in my own logs.
There are probably other cases where KERN_CONT will be needed to be
added, either because it is new code that never dealt with the need for
KERN_CONT, or old code that has bitrotted without anybody noticing.
That said, we should strive to avoid the need for KERN_CONT. It does
result in real problems for logging, and should generally not be seen as
a good feature. If we some day can get rid of the feature entirely,
because nobody does any fragmented printk calls, that would be lovely.
But until that point, let's at mark the code that relies on the hacky
multi-fragment kernel printk's. Not only does it avoid the ambiguity,
it also annotates code as "maybe this would be good to fix some day".
(That said, particularly during single-threaded bootup, the downsides of
KERN_CONT are very limited. Things get much hairier when you have
multiple threads going on and user level reading and writing logs too).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix to return error code -EINVAL from the error handling case instead
of 0 (rc is overwrite to 0 when policyvers >=
POLICYDB_VERSION_ROLETRANS), as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
[PM: normalize "selinux" in patch subject, description line wrap]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Throughout the SELinux LSM, values taken from sepolicy are
used in places where length == 0 or length == <saturated>
matter, find and fix these.
Signed-off-by: William Roberts <william.c.roberts@intel.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
libsepol pointed out an issue where its possible to have
an unitialized jmp and invalid dereference, fix this.
While we're here, zero allocate all the *_val_to_struct
structures.
Signed-off-by: William Roberts <william.c.roberts@intel.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
When count is 0 and the highbit is not zero, the ebitmap is not
valid and the internal node is not allocated. This causes issues
when routines, like mls_context_isvalid() attempt to use the
ebitmap_for_each_bit() and ebitmap_node_get_bit() as they assume
a highbit > 0 will have a node allocated.
Signed-off-by: William Roberts <william.c.roberts@intel.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The existing ebitmap_netlbl_import() code didn't correctly handle the
case where the ebitmap_node was not aligned/sized to a power of two,
this patch fixes this (on x86_64 ebitmap_node contains six bitmaps
making a range of 0..383).
Signed-off-by: Paul Moore <paul@paul-moore.com>
The current bounds checking of both source and target types
requires allowing any domain that has access to the child
domain to also have the same permissions to the parent, which
is undesirable. Drop the target bounds checking.
KaiGai Kohei originally removed all use of target bounds in
commit 7d52a155e3 ("selinux: remove dead code in
type_attribute_bounds_av()") but this was reverted in
commit 2ae3ba3938 ("selinux: libsepol: remove dead code in
check_avtab_hierarchy_callback()") because it would have
required explicitly allowing the parent any permissions
to the child that the child is allowed to itself.
This change in contrast retains the logic for the case where both
source and target types are bounded, thereby allowing access
if the parent of the source is allowed the corresponding
permissions to the parent of the target. Further, this change
reworks the logic such that we only perform a single computation
for each case and there is no ambiguity as to how to resolve
a bounds violation.
Under the new logic, if the source type and target types are both
bounded, then the parent of the source type must be allowed the same
permissions to the parent of the target type. If only the source
type is bounded, then the parent of the source type must be allowed
the same permissions to the target type.
Examples of the new logic and comparisons with the old logic:
1. If we have:
typebounds A B;
then:
allow B self:process <permissions>;
will satisfy the bounds constraint iff:
allow A self:process <permissions>;
is also allowed in policy.
Under the old logic, the allow rule on B satisfies the
bounds constraint if any of the following three are allowed:
allow A B:process <permissions>; or
allow B A:process <permissions>; or
allow A self:process <permissions>;
However, either of the first two ultimately require the third to
satisfy the bounds constraint under the old logic, and therefore
this degenerates to the same result (but is more efficient - we only
need to perform one compute_av call).
2. If we have:
typebounds A B;
typebounds A_exec B_exec;
then:
allow B B_exec:file <permissions>;
will satisfy the bounds constraint iff:
allow A A_exec:file <permissions>;
is also allowed in policy.
This is essentially the same as #1; it is merely included as
an example of dealing with object types related to a bounded domain
in a manner that satisfies the bounds relationship. Note that
this approach is preferable to leaving B_exec unbounded and having:
allow A B_exec:file <permissions>;
in policy because that would allow B's entrypoints to be used to
enter A. Similarly for _tmp or other related types.
3. If we have:
typebounds A B;
and an unbounded type T, then:
allow B T:file <permissions>;
will satisfy the bounds constraint iff:
allow A T:file <permissions>;
is allowed in policy.
The old logic would have been identical for this example.
4. If we have:
typebounds A B;
and an unbounded domain D, then:
allow D B:unix_stream_socket <permissions>;
is not subject to any bounds constraints under the new logic
because D is not bounded. This is desirable so that we can
allow a domain to e.g. connectto a child domain without having
to allow it to do the same to its parent.
The old logic would have required:
allow D A:unix_stream_socket <permissions>;
to also be allowed in policy.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: re-wrapped description to appease checkpatch.pl]
Signed-off-by: Paul Moore <paul@paul-moore.com>
security_get_bool_value(int bool) argument "bool" conflicts with
in-kernel macros such as BUILD_BUG(). This patch changes this to
index which isn't a type.
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Eric Paris <eparis@parisplace.org>
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andrew Perepechko <anserper@ya.ru>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: selinux@tycho.nsa.gov
Cc: Eric Paris <eparis@redhat.com>
Cc: Paul Moore <pmoore@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
[PM: wrapped description for checkpatch.pl, use "selinux:..." as subj]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Pull security subsystem updates from James Morris:
- EVM gains support for loading an x509 cert from the kernel
(EVM_LOAD_X509), into the EVM trusted kernel keyring.
- Smack implements 'file receive' process-based permission checking for
sockets, rather than just depending on inode checks.
- Misc enhancments for TPM & TPM2.
- Cleanups and bugfixes for SELinux, Keys, and IMA.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (41 commits)
selinux: Inode label revalidation performance fix
KEYS: refcount bug fix
ima: ima_write_policy() limit locking
IMA: policy can be updated zero times
selinux: rate-limit netlink message warnings in selinux_nlmsg_perm()
selinux: export validatetrans decisions
gfs2: Invalid security labels of inodes when they go invalid
selinux: Revalidate invalid inode security labels
security: Add hook to invalidate inode security labels
selinux: Add accessor functions for inode->i_security
security: Make inode argument of inode_getsecid non-const
security: Make inode argument of inode_getsecurity non-const
selinux: Remove unused variable in selinux_inode_init_security
keys, trusted: seal with a TPM2 authorization policy
keys, trusted: select hash algorithm for TPM2 chips
keys, trusted: fix: *do not* allow duplicate key options
tpm_ibmvtpm: properly handle interrupted packet receptions
tpm_tis: Tighten IRQ auto-probing
tpm_tis: Refactor the interrupt setup
tpm_tis: Get rid of the duplicate IRQ probing code
...
Make validatetrans decisions available through selinuxfs.
"/validatetrans" is added to selinuxfs for this purpose.
This functionality is needed by file system servers
implemented in userspace or kernelspace without the VFS
layer.
Writing "$oldcontext $newcontext $tclass $taskcontext"
to /validatetrans is expected to return 0 if the transition
is allowed and -EPERM otherwise.
Signed-off-by: Andrew Perepechko <anserper@ya.ru>
CC: andrew.perepechko@seagate.com
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
commit fa1aa143ac ("selinux: extended permissions for ioctls")
introduced a bug into the handling of conditional rules, skipping the
processing entirely when the caller does not provide an extended
permissions (xperms) structure. Access checks from userspace using
/sys/fs/selinux/access do not include such a structure since that
interface does not presently expose extended permission information.
As a result, conditional rules were being ignored entirely on userspace
access requests, producing denials when access was allowed by
conditional rules in the policy. Fix the bug by only skipping
computation of extended permissions in this situation, not the entire
conditional rules processing.
Reported-by: Laurent Bigonville <bigon@debian.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: fixed long lines in patch description]
Cc: stable@vger.kernel.org # 4.3
Signed-off-by: Paul Moore <pmoore@redhat.com>
sprintf returns the number of characters printed (excluding '\0'), so
we can use that and avoid duplicating the length computation.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
This is much simpler.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
There seems to be a little confusion as to whether the scontext_len
parameter of security_context_to_sid() includes the nul-byte or
not. Reading security_context_to_sid_core(), it seems that the
expectation is that it does not (both the string copying and the test
for scontext_len being zero hint at that).
Introduce the helper security_context_str_to_sid() to do the strlen()
call and fix all callers.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Add extended permissions logic to selinux. Extended permissions
provides additional permissions in 256 bit increments. Extend the
generic ioctl permission check to use the extended permissions for
per-command filtering. Source/target/class sets including the ioctl
permission may additionally include a set of commands. Example:
allowxperm <source> <target>:<class> ioctl unpriv_app_socket_cmds
auditallowxperm <source> <target>:<class> ioctl priv_gpu_cmds
Where unpriv_app_socket_cmds and priv_gpu_cmds are macros
representing commonly granted sets of ioctl commands.
When ioctl commands are omitted only the permissions are checked.
This feature is intended to provide finer granularity for the ioctl
permission that may be too imprecise. For example, the same driver
may use ioctls to provide important and benign functionality such as
driver version or socket type as well as dangerous capabilities such
as debugging features, read/write/execute to physical memory or
access to sensitive data. Per-command filtering provides a mechanism
to reduce the attack surface of the kernel, and limit applications
to the subset of commands required.
The format of the policy binary has been modified to include ioctl
commands, and the policy version number has been incremented to
POLICYDB_VERSION_XPERMS_IOCTL=30 to account for the format
change.
The extended permissions logic is deliberately generic to allow
components to be reused e.g. netlink filters
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Acked-by: Nick Kralevich <nnk@google.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
At present we don't create efficient ebitmaps when importing NetLabel
category bitmaps. This can present a problem when comparing ebitmaps
since ebitmap_cmp() is very strict about these things and considers
these wasteful ebitmaps not equal when compared to their more
efficient counterparts, even if their values are the same. This isn't
likely to cause problems on 64-bit systems due to a bit of luck on
how NetLabel/CIPSO works and the default ebitmap size, but it can be
a problem on 32-bit systems.
This patch fixes this problem by being a bit more intelligent when
importing NetLabel category bitmaps by skipping over empty sections
which should result in a nice, efficient ebitmap.
Cc: stable@vger.kernel.org # 3.17
Signed-off-by: Paul Moore <pmoore@redhat.com>
Now that we can safely increase the avtab max buckets without
triggering high order allocations and have a hash function that
will make better use of the larger number of buckets, increase
the max buckets to 2^16.
Original:
101421 entries and 2048/2048 buckets used, longest chain length 374
With new hash function:
101421 entries and 2048/2048 buckets used, longest chain length 81
With increased max buckets:
101421 entries and 31078/32768 buckets used, longest chain length 12
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
This function, based on murmurhash3, has much better distribution than
the original. Using the current default of 2048 buckets, there are many
fewer collisions:
Before:
101421 entries and 2048/2048 buckets used, longest chain length 374
After:
101421 entries and 2048/2048 buckets used, longest chain length 81
The difference becomes much more significant when buckets are increased.
A naive attempt to expand the current function to larger outputs doesn't
yield any significant improvement; so this function is a prerequisite
for increasing the bucket size.
sds: Adapted from the original patches for libsepol to the kernel.
Signed-off-by: John Brooks <john.brooks@jolla.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Previously we shrank the avtab max hash buckets to avoid
high order memory allocations, but this causes avtab lookups to
degenerate to very long linear searches for the Fedora policy. Convert to
using a flex_array instead so that we can increase the buckets
without such limitations.
This change does not alter the max hash buckets; that is left to a
separate follow-on change.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Move the NetLabel secattr MLS category import logic into
mls_import_netlbl_cat() where it belongs, and use the
mls_import_netlbl_cat() function in security_netlbl_secattr_to_sid().
Reported-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Paul Moore <pmoore@redhat.com>
If hashtab_create() returns a NULL pointer then we should return -ENOMEM
but instead the current code returns success.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Restructure to keyword=value pairs without spaces. Drop superfluous words in
text. Make invalid_context a keyword. Change result= keyword to seresult=.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[Minor rewrite to the patch subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Historically the NetLabel LSM secattr catmap functions and data
structures have had very long names which makes a mess of the NetLabel
code and anyone who uses NetLabel. This patch renames the catmap
functions and structures from "*_secattr_catmap_*" to just "*_catmap_*"
which improves things greatly.
There are no substantial code or logic changes in this patch.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
The NetLabel secattr catmap functions, and the SELinux import/export
glue routines, were broken in many horrible ways and the SELinux glue
code fiddled with the NetLabel catmap structures in ways that we
probably shouldn't allow. At some point this "worked", but that was
likely due to a bit of dumb luck and sub-par testing (both inflicted
by yours truly). This patch corrects these problems by basically
gutting the code in favor of something less obtuse and restoring the
NetLabel abstractions in the SELinux catmap glue code.
Everything is working now, and if it decides to break itself in the
future this code will be much easier to debug than the code it
replaces.
One noteworthy side effect of the changes is that it is no longer
necessary to allocate a NetLabel catmap before calling one of the
NetLabel APIs to set a bit in the catmap. NetLabel will automatically
allocate the catmap nodes when needed, resulting in less allocations
when the lowest bit is greater than 255 and less code in the LSMs.
Cc: stable@vger.kernel.org
Reported-by: Christian Evans <frodox@zoho.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
With the introduction of fair queued rwlock, recursive read_lock()
may hang the offending process if there is a write_lock() somewhere
in between.
With recursive read_lock checking enabled, the following error was
reported:
=============================================
[ INFO: possible recursive locking detected ]
3.16.0-rc1 #2 Tainted: G E
---------------------------------------------
load_policy/708 is trying to acquire lock:
(policy_rwlock){.+.+..}, at: [<ffffffff8125b32a>]
security_genfs_sid+0x3a/0x170
but task is already holding lock:
(policy_rwlock){.+.+..}, at: [<ffffffff8125b48c>]
security_fs_use+0x2c/0x110
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(policy_rwlock);
lock(policy_rwlock);
This patch fixes the occurrence of recursive read_lock() of
policy_rwlock by adding a helper function __security_genfs_sid()
which requires caller to take the lock before calling it. The
security_fs_use() was then modified to call the new helper function.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
The cond_read_node() should free the given node on error path as it's
not linked to p->cond_list yet. This is done via cond_node_destroy()
but it's not called when next_entry() fails before the expr loop.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Paul Moore <pmoore@redhat.com>
The node->cur_state and len can be read in a single call of next_entry().
And setting len before reading is a dead write so can be eliminated.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
(Minor tweak to the length parameter in the call to next_entry())
Signed-off-by: Paul Moore <pmoore@redhat.com>
There're some code duplication for reading a string value during
policydb_read(). Add str_read() helper to fix it.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Paul Moore <pmoore@redhat.com>
ARRAY_SIZE is more concise to use when the size of an array is divided
by the size of its type or the size of its first element.
The Coccinelle semantic patch that makes this change is as follows:
// <smpl>
@@
type T;
T[] E;
@@
- (sizeof(E)/sizeof(E[...]))
+ ARRAY_SIZE(E)
// </smpl>
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
After silencing the sleeping warning in mls_convert_context() I started
seeing similar traces from hashtab_insert. Do a cond_resched there too.
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
security_xfrm_policy_alloc can be called in atomic context so the
allocation should be done with GFP_ATOMIC. Add an argument to let the
callers choose the appropriate way. In order to do so a gfp argument
needs to be added to the method xfrm_policy_alloc_security in struct
security_operations and to the internal function
selinux_xfrm_alloc_user. After that switch to GFP_ATOMIC in the atomic
callers and leave GFP_KERNEL as before for the rest.
The path that needed the gfp argument addition is:
security_xfrm_policy_alloc -> security_ops.xfrm_policy_alloc_security ->
all users of xfrm_policy_alloc_security (e.g. selinux_xfrm_policy_alloc) ->
selinux_xfrm_alloc_user (here the allocation used to be GFP_KERNEL only)
Now adding a gfp argument to selinux_xfrm_alloc_user requires us to also
add it to security_context_to_sid which is used inside and prior to this
patch did only GFP_KERNEL allocation. So add gfp argument to
security_context_to_sid and adjust all of its callers as well.
CC: Paul Moore <paul@paul-moore.com>
CC: Dave Jones <davej@redhat.com>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Fan Du <fan.du@windriver.com>
CC: David S. Miller <davem@davemloft.net>
CC: LSM list <linux-security-module@vger.kernel.org>
CC: SELinux list <selinux@tycho.nsa.gov>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When writing policy via /sys/fs/selinux/policy I wrote the type and class
of filename trans rules in CPU endian instead of little endian. On
x86_64 this works just fine, but it means that on big endian arch's like
ppc64 and s390 userspace reads the policy and converts it from
le32_to_cpu. So the values are all screwed up. Write the values in le
format like it should have been to start.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Setting an empty security context (length=0) on a file will
lead to incorrectly dereferencing the type and other fields
of the security context structure, yielding a kernel BUG.
As a zero-length security context is never valid, just reject
all such security contexts whether coming from userspace
via setxattr or coming from the filesystem upon a getxattr
request by SELinux.
Setting a security context value (empty or otherwise) unknown to
SELinux in the first place is only possible for a root process
(CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
if the corresponding SELinux mac_admin permission is also granted
to the domain by policy. In Fedora policies, this is only allowed for
specific domains such as livecd for setting down security contexts
that are not defined in the build host policy.
Reproducer:
su
setenforce 0
touch foo
setfattr -n security.selinux foo
Caveat:
Relabeling or removing foo after doing the above may not be possible
without booting with SELinux disabled. Any subsequent access to foo
after doing the above will also trigger the BUG.
BUG output from Matthew Thode:
[ 473.893141] ------------[ cut here ]------------
[ 473.962110] kernel BUG at security/selinux/ss/services.c:654!
[ 473.995314] invalid opcode: 0000 [#6] SMP
[ 474.027196] Modules linked in:
[ 474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G D I
3.13.0-grsec #1
[ 474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
07/29/10
[ 474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
ffff8805f50cd488
[ 474.183707] RIP: 0010:[<ffffffff814681c7>] [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[ 474.219954] RSP: 0018:ffff8805c0ac3c38 EFLAGS: 00010246
[ 474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
0000000000000100
[ 474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
ffff8805e8aaa000
[ 474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
0000000000000006
[ 474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
0000000000000006
[ 474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
0000000000000000
[ 474.453816] FS: 00007f2e75220800(0000) GS:ffff88061fc00000(0000)
knlGS:0000000000000000
[ 474.489254] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
00000000000207f0
[ 474.556058] Stack:
[ 474.584325] ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
ffff8805f1190a40
[ 474.618913] ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
ffff8805e8aac860
[ 474.653955] ffff8805c0ac3cb8 000700068113833a ffff880606c75060
ffff8805c0ac3d94
[ 474.690461] Call Trace:
[ 474.723779] [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a
[ 474.778049] [<ffffffff81468824>] security_compute_av+0xf4/0x20b
[ 474.811398] [<ffffffff8196f419>] avc_compute_av+0x2a/0x179
[ 474.843813] [<ffffffff8145727b>] avc_has_perm+0x45/0xf4
[ 474.875694] [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31
[ 474.907370] [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e
[ 474.938726] [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22
[ 474.970036] [<ffffffff811b057d>] vfs_getattr+0x19/0x2d
[ 475.000618] [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91
[ 475.030402] [<ffffffff811b063b>] vfs_lstat+0x19/0x1b
[ 475.061097] [<ffffffff811b077e>] SyS_newlstat+0x15/0x30
[ 475.094595] [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3
[ 475.148405] [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b
[ 475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
[ 475.255884] RIP [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[ 475.296120] RSP <ffff8805c0ac3c38>
[ 475.328734] ---[ end trace f076482e9d754adc ]---
Reported-by: Matthew Thode <mthode@mthode.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull audit update from Eric Paris:
"Again we stayed pretty well contained inside the audit system.
Venturing out was fixing a couple of function prototypes which were
inconsistent (didn't hurt anything, but we used the same value as an
int, uint, u32, and I think even a long in a couple of places).
We also made a couple of minor changes to when a couple of LSMs called
the audit system. We hoped to add aarch64 audit support this go
round, but it wasn't ready.
I'm disappearing on vacation on Thursday. I should have internet
access, but it'll be spotty. If anything goes wrong please be sure to
cc rgb@redhat.com. He'll make fixing things his top priority"
* git://git.infradead.org/users/eparis/audit: (50 commits)
audit: whitespace fix in kernel-parameters.txt
audit: fix location of __net_initdata for audit_net_ops
audit: remove pr_info for every network namespace
audit: Modify a set of system calls in audit class definitions
audit: Convert int limit uses to u32
audit: Use more current logging style
audit: Use hex_byte_pack_upper
audit: correct a type mismatch in audit_syscall_exit()
audit: reorder AUDIT_TTY_SET arguments
audit: rework AUDIT_TTY_SET to only grab spin_lock once
audit: remove needless switch in AUDIT_SET
audit: use define's for audit version
audit: documentation of audit= kernel parameter
audit: wait_for_auditd rework for readability
audit: update MAINTAINERS
audit: log task info on feature change
audit: fix incorrect set of audit_sock
audit: print error message when fail to create audit socket
audit: fix dangling keywords in audit_log_set_loginuid() output
audit: log on errors from filter user rules
...
Two of the conditions in selinux_audit_rule_match() should never happen and
the third indicates a race that should be retried. Remove the calls to
audit_log() (which call audit_log_start()) and deal with the errors in the
caller, logging only once if the condition is met. Calling audit_log_start()
in this location makes buffer allocation and locking more complicated in the
calling tree (audit_filter_user()).
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Hello.
I got below leak with linux-3.10.0-54.0.1.el7.x86_64 .
[ 681.903890] kmemleak: 5538 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
Below is a patch, but I don't know whether we need special handing for undoing
ebitmap_set_bit() call.
----------
>>From fe97527a90fe95e2239dfbaa7558f0ed559c0992 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Mon, 6 Jan 2014 16:30:21 +0900
Subject: [PATCH] SELinux: Fix memory leak upon loading policy
Commit 2463c26d "SELinux: put name based create rules in a hashtable" did not
check return value from hashtab_insert() in filename_trans_read(). It leaks
memory if hashtab_insert() returns error.
unreferenced object 0xffff88005c9160d0 (size 8):
comm "systemd", pid 1, jiffies 4294688674 (age 235.265s)
hex dump (first 8 bytes):
57 0b 00 00 6b 6b 6b a5 W...kkk.
backtrace:
[<ffffffff816604ae>] kmemleak_alloc+0x4e/0xb0
[<ffffffff811cba5e>] kmem_cache_alloc_trace+0x12e/0x360
[<ffffffff812aec5d>] policydb_read+0xd1d/0xf70
[<ffffffff812b345c>] security_load_policy+0x6c/0x500
[<ffffffff812a623c>] sel_write_load+0xac/0x750
[<ffffffff811eb680>] vfs_write+0xc0/0x1f0
[<ffffffff811ec08c>] SyS_write+0x4c/0xa0
[<ffffffff81690419>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
However, we should not return EEXIST error to the caller, or the systemd will
show below message and the boot sequence freezes.
systemd[1]: Failed to load SELinux policy. Freezing.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Eric Paris <eparis@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Revert "selinux: consider filesystem subtype in policies"
This reverts commit 102aefdda4.
Explanation from Eric Paris:
SELinux policy can specify if it should use a filesystem's
xattrs or not. In current policy we have a specification that
fuse should not use xattrs but fuse.glusterfs should use
xattrs. This patch has a bug in which non-glusterfs
filesystems would match the rule saying fuse.glusterfs should
use xattrs. If both fuse and the particular filesystem in
question are not written to handle xattr calls during the mount
command, they will deadlock.
I have fixed the bug to do proper matching, however I believe a
revert is still the correct solution. The reason I believe
that is because the code still does not work. The s_subtype is
not set until after the SELinux hook which attempts to match on
the ".gluster" portion of the rule. So we cannot match on the
rule in question. The code is useless.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Dynamically allocate a couple of the larger stack variables in order to
reduce the stack footprint below 1024. gcc-4.8
security/selinux/ss/services.c: In function 'security_load_policy':
security/selinux/ss/services.c:1964:1: warning: the frame size of 1104 bytes is larger than 1024 bytes [-Wframe-larger-than=]
}
Also silence a couple of checkpatch warnings at the same time.
WARNING: sizeof policydb should be sizeof(policydb)
+ memcpy(oldpolicydb, &policydb, sizeof policydb);
WARNING: sizeof policydb should be sizeof(policydb)
+ memcpy(&policydb, newpolicydb, sizeof policydb);
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Update the policy version (POLICYDB_VERSION_CONSTRAINT_NAMES) to allow
holding of policy source info for constraints.
Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Conflicts:
security/selinux/hooks.c
Pull Eric's existing SELinux tree as there are a number of patches in
there that are not yet upstream. There was some minor fixup needed to
resolve a conflict in security/selinux/hooks.c:selinux_set_mnt_opts()
between the labeled NFS patches and Eric's security_fs_use()
simplification patch.
Not considering sub filesystem has the following limitation. Support
for SELinux in FUSE is dependent on the particular userspace
filesystem, which is identified by the subtype. For e.g, GlusterFS,
a FUSE based filesystem supports SELinux (by mounting and processing
FUSE requests in different threads, avoiding the mount time
deadlock), whereas other FUSE based filesystems (identified by a
different subtype) have the mount time deadlock.
By considering the subtype of the filesytem in the SELinux policies,
allows us to specify a filesystem subtype, in the following way:
fs_use_xattr fuse.glusterfs gen_context(system_u:object_r:fs_t,s0);
This way not all FUSE filesystems are put in the same bucket and
subjected to the limitations of the other subtypes.
Signed-off-by: Anand Avati <avati@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently the packet class in SELinux is not checked if there are no
SECMARK rules in the security or mangle netfilter tables. Some systems
prefer that packets are always checked, for example, to protect the system
should the netfilter rules fail to load or if the nefilter rules
were maliciously flushed.
Add the always_check_network policy capability which, when enabled, treats
SECMARK as enabled, even if there are no netfilter SECMARK rules and
treats peer labeling as enabled, even if there is no Netlabel or
labeled IPSEC configuration.
Includes definition of "redhat1" SELinux policy capability, which
exists in the SELinux userpace library, to keep ordering correct.
The SELinux userpace portion of this was merged last year, but this kernel
change fell on the floor.
Signed-off-by: Chris PeBenito <cpebenito@tresys.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Rather than passing pointers to memory locations, strings, and other
stuff just give up on the separation and give security_fs_use the
superblock. It just makes the code easier to read (even if not easier to
reuse on some other OS)
Signed-off-by: Eric Paris <eparis@redhat.com>
We only have 6 options, so char is good enough, but use a short as that
packs nicely. This shrinks the superblock_security_struct just a little
bit.
Signed-off-by: Eric Paris <eparis@redhat.com>
The /sys/fs/selinux/policy file is not valid on big endian systems like
ppc64 or s390. Let's see why:
static int hashtab_cnt(void *key, void *data, void *ptr)
{
int *cnt = ptr;
*cnt = *cnt + 1;
return 0;
}
static int range_write(struct policydb *p, void *fp)
{
size_t nel;
[...]
/* count the number of entries in the hashtab */
nel = 0;
rc = hashtab_map(p->range_tr, hashtab_cnt, &nel);
if (rc)
return rc;
buf[0] = cpu_to_le32(nel);
rc = put_entry(buf, sizeof(u32), 1, fp);
So size_t is 64 bits. But then we pass a pointer to it as we do to
hashtab_cnt. hashtab_cnt thinks it is a 32 bit int and only deals with
the first 4 bytes. On x86_64 which is little endian, those first 4
bytes and the least significant, so this works out fine. On ppc64/s390
those first 4 bytes of memory are the high order bits. So at the end of
the call to hashtab_map nel has a HUGE number. But the least
significant 32 bits are all 0's.
We then pass that 64 bit number to cpu_to_le32() which happily truncates
it to a 32 bit number and does endian swapping. But the low 32 bits are
all 0's. So no matter how many entries are in the hashtab, big endian
systems always say there are 0 entries because I screwed up the
counting.
The fix is easy. Use a 32 bit int, as the hashtab_cnt expects, for nel.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Currently, the ebitmap_node structure has a fixed size of 32 bytes. On
a 32-bit system, the overhead is 8 bytes, leaving 24 bytes for being
used as bitmaps. The overhead ratio is 1/4.
On a 64-bit system, the overhead is 16 bytes. Therefore, only 16 bytes
are left for bitmap purpose and the overhead ratio is 1/2. With a
3.8.2 kernel, a boot-up operation will cause the ebitmap_get_bit()
function to be called about 9 million times. The average number of
ebitmap_node traversal is about 3.7.
This patch increases the size of the ebitmap_node structure to 64
bytes for 64-bit system to keep the overhead ratio at 1/4. This may
also improve performance a little bit by making node to node traversal
less frequent (< 2) as more bits are available in each node.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
While running the high_systime workload of the AIM7 benchmark on
a 2-socket 12-core Westmere x86-64 machine running 3.10-rc4 kernel
(with HT on), it was found that a pretty sizable amount of time was
spent in the SELinux code. Below was the perf trace of the "perf
record -a -s" of a test run at 1500 users:
5.04% ls [kernel.kallsyms] [k] ebitmap_get_bit
1.96% ls [kernel.kallsyms] [k] mls_level_isvalid
1.95% ls [kernel.kallsyms] [k] find_next_bit
The ebitmap_get_bit() was the hottest function in the perf-report
output. Both the ebitmap_get_bit() and find_next_bit() functions
were, in fact, called by mls_level_isvalid(). As a result, the
mls_level_isvalid() call consumed 8.95% of the total CPU time of
all the 24 virtual CPUs which is quite a lot. The majority of the
mls_level_isvalid() function invocations come from the socket creation
system call.
Looking at the mls_level_isvalid() function, it is checking to see
if all the bits set in one of the ebitmap structure are also set in
another one as well as the highest set bit is no bigger than the one
specified by the given policydb data structure. It is doing it in
a bit-by-bit manner. So if the ebitmap structure has many bits set,
the iteration loop will be done many times.
The current code can be rewritten to use a similar algorithm as the
ebitmap_contains() function with an additional check for the
highest set bit. The ebitmap_contains() function was extended to
cover an optional additional check for the highest set bit, and the
mls_level_isvalid() function was modified to call ebitmap_contains().
With that change, the perf trace showed that the used CPU time drop
down to just 0.08% (ebitmap_contains + mls_level_isvalid) of the
total which is about 100X less than before.
0.07% ls [kernel.kallsyms] [k] ebitmap_contains
0.05% ls [kernel.kallsyms] [k] ebitmap_get_bit
0.01% ls [kernel.kallsyms] [k] mls_level_isvalid
0.01% ls [kernel.kallsyms] [k] find_next_bit
The remaining ebitmap_get_bit() and find_next_bit() functions calls
are made by other kernel routines as the new mls_level_isvalid()
function will not call them anymore.
This patch also improves the high_systime AIM7 benchmark result,
though the improvement is not as impressive as is suggested by the
reduction in CPU time spent in the ebitmap functions. The table below
shows the performance change on the 2-socket x86-64 system (with HT
on) mentioned above.
+--------------+---------------+----------------+-----------------+
| Workload | mean % change | mean % change | mean % change |
| | 10-100 users | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| high_systime | +0.1% | +0.9% | +2.6% |
+--------------+---------------+----------------+-----------------+
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
There currently doesn't exist a labeling type that is adequate for use with
labeled NFS. Since NFS doesn't really support xattrs we can't use the use xattr
labeling behavior. For this we developed a new labeling type. The native
labeling type is used solely by NFS to ensure NFS inodes are labeled at runtime
by the NFS code instead of relying on the SELinux security server on the client
end.
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
avc_add_callback now just used for registering reset functions
in initcalls, and the callback functions just did reset operations.
So, reducing the arguments to only one event is enough now.
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
It's possible that the caller passed a NULL for scontext. However if this
is a defered mapping we might still attempt to call *scontext=kstrdup().
This is bad. Instead just return the len.
Signed-off-by: Eric Paris <eparis@redhat.com>
Because Fedora shipped userspace based on my development tree we now
have policy version 27 in the wild defining only default user, role, and
range. Thus to add default_type we need a policy.28.
Signed-off-by: Eric Paris <eparis@redhat.com>
When new objects are created we have great and flexible rules to
determine the type of the new object. We aren't quite as flexible or
mature when it comes to determining the user, role, and range. This
patch adds a new ability to specify the place a new objects user, role,
and range should come from. For users and roles it can come from either
the source or the target of the operation. aka for files the user can
either come from the source (the running process and todays default) or
it can come from the target (aka the parent directory of the new file)
examples always are done with
directory context: system_u:object_r:mnt_t:s0-s0:c0.c512
process context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[no rule]
unconfined_u:object_r:mnt_t:s0 test_none
[default user source]
unconfined_u:object_r:mnt_t:s0 test_user_source
[default user target]
system_u:object_r:mnt_t:s0 test_user_target
[default role source]
unconfined_u:unconfined_r:mnt_t:s0 test_role_source
[default role target]
unconfined_u:object_r:mnt_t:s0 test_role_target
[default range source low]
unconfined_u:object_r:mnt_t:s0 test_range_source_low
[default range source high]
unconfined_u:object_r:mnt_t:s0:c0.c1023 test_range_source_high
[default range source low-high]
unconfined_u:object_r:mnt_t:s0-s0:c0.c1023 test_range_source_low-high
[default range target low]
unconfined_u:object_r:mnt_t:s0 test_range_target_low
[default range target high]
unconfined_u:object_r:mnt_t:s0:c0.c512 test_range_target_high
[default range target low-high]
unconfined_u:object_r:mnt_t:s0-s0:c0.c512 test_range_target_low-high
Signed-off-by: Eric Paris <eparis@redhat.com>
The semantic patch that makes this change is available
in scripts/coccinelle/api/alloc/drop_kmalloc_cast.cocci.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: James Morris <jmorris@namei.org>
My @hp.com will no longer be valid starting August 5, 2011 so an update is
necessary. My new email address is employer independent so we don't have
to worry about doing this again any time soon.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When policy version is less than POLICYDB_VERSION_FILENAME_TRANS,
skip file_name_trans_write().
Signed-off-by: Roy.Li <rongqing.li@windriver.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Right now security_get_user_sids() will pass in a NULL avd pointer to
avc_has_perm_noaudit(), which then forces that function to have a dummy
entry for that case and just generally test it.
Don't do it. The normal callers all pass a real avd pointer, and this
helper function is incredibly hot. So don't make avc_has_perm_noaudit()
do conditional stuff that isn't needed for the common case.
This also avoids some duplicated stack space.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The filename_trans rule processing has some printk(KERN_ERR ) messages
which were intended as debug aids in creating the code but weren't removed
before it was submitted. Remove them.
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Eric Paris <eparis@redhat.com>
Change flex_array_prealloc to take the number of elements for which space
should be allocated instead of the last (inclusive) element. Users
and documentation are updated accordingly. flex_arrays got introduced before
they had users. When folks started using it, they ended up needing a
different API than was coded up originally. This swaps over to the API that
folks apparently need.
Based-on-patch-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Tested-by: Chris Richards <gizmo@giz-works.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: stable@kernel.org [2.6.38+]
Change flex_array_prealloc to take the number of elements for which space
should be allocated instead of the last (inclusive) element. Users
and documentation are updated accordingly. flex_arrays got introduced before
they had users. When folks started using it, they ended up needing a
different API than was coded up originally. This swaps over to the API that
folks apparently need.
Based-on-patch-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Tested-by: Chris Richards <gizmo@giz-works.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: stable@kernel.org [2.6.38+]
To shorten the list we need to run if filename trans rules exist for the type
of the given parent directory I put them in a hashtable. Given the policy we
are expecting to use in Fedora this takes the worst case list run from about
5,000 entries to 17.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Instead of a hashtab entry counter function only useful for range
transition rules make a function generic for any hashtable to use.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
We have custom debug functions like rangetr_hash_eval and symtab_hash_eval
which do the same thing. Just create a generic function that takes the name
of the hash table as an argument instead of having custom functions.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Right now we walk to filename trans rule list for every inode that is
created. First passes at policy using this facility creates around 5000
filename trans rules. Running a list of 5000 entries every time is a bad
idea. This patch adds a new ebitmap to policy which has a bit set for each
ttype that has at least 1 filename trans rule. Thus when an inode is
created we can quickly determine if any rules exist for this parent
directory type and can skip the list if we know there is definitely no
relevant entry.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
filename_compute_type() takes as arguments the numeric value of the type of
the subject and target. It does not take a context. Thus the names are
misleading. Fix the argument names.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
filename_compute_type used to take a qstr, but it now takes just a name.
Fix the comments to indicate it is an objname, not a qstr.
Signed-off-by: Eric Paris <eparis@redhat.com>
The len should be an size_t but is a ssize_t. Easy enough fix to silence
build warnings. We have no need for signed-ness.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
The filename_trans rule processing has some printk(KERN_ERR ) messages
which were intended as debug aids in creating the code but weren't removed
before it was submitted. Remove them.
Signed-off-by: Eric Paris <eparis@redhat.com>
Initialize policydb.process_class once all symtabs read from policy image,
so that it could be used to setup the role_trans.tclass field when a lower
version policy.X is loaded.
Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Commit 6f5317e730 introduced a bug in the
handling of userspace object classes that is causing breakage for Xorg
when XSELinux is enabled. Fix the bug by changing map_class() to return
SECCLASS_NULL when the class cannot be mapped to a kernel object class.
Reported-by: "Justin P. Mattock" <justinmattock@gmail.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
The attached patch allows /selinux/create takes optional 4th argument
to support TYPE_TRANSITION with name extension for userspace object
managers.
If 4th argument is not supplied, it shall perform as existing kernel.
In fact, the regression test of SE-PostgreSQL works well on the patched
kernel.
Thanks,
Signed-off-by: KaiGai Kohei <kohei.kaigai@eu.nec.com>
[manually verify fuzz was not an issue, and it wasn't: eparis]
Signed-off-by: Eric Paris <eparis@redhat.com>
Commit 6f5317e730 introduced a bug in the
handling of userspace object classes that is causing breakage for Xorg
when XSELinux is enabled. Fix the bug by changing map_class() to return
SECCLASS_NULL when the class cannot be mapped to a kernel object class.
Reported-by: "Justin P. Mattock" <justinmattock@gmail.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
If kernel policy version is >= 26, then write the class field of the
role_trans structure into the binary reprensentation.
Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Eric Paris <eparis@redhat.com>
Apply role_transition rules for all kinds of classes.
Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Eric Paris <eparis@redhat.com>
If kernel policy version is >= 26, then the binary representation of
the role_trans structure supports specifying the class for the current
subject or the newly created object.
If kernel policy version is < 26, then the class field would be default
to the process class.
Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Eric Paris <eparis@redhat.com>
The socket SID would be computed on creation and no longer inherit
its creator's SID by default. Socket may have a different type but
needs to retain the creator's role and MLS attribute in order not
to break labeled networking and network access control.
The kernel value for a class would be used to determine if the class
if one of socket classes. If security_compute_sid is called from
userspace the policy value for a class would be mapped to the relevant
kernel value first.
Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Currently SELinux has rules which label new objects according to 3 criteria.
The label of the process creating the object, the label of the parent
directory, and the type of object (reg, dir, char, block, etc.) This patch
adds a 4th criteria, the dentry name, thus we can distinguish between
creating a file in an etc_t directory called shadow and one called motd.
There is no file globbing, regex parsing, or anything mystical. Either the
policy exactly (strcmp) matches the dentry name of the object or it doesn't.
This patch has no changes from today if policy does not implement the new
rules.
Signed-off-by: Eric Paris <eparis@redhat.com>
sidtab_context_to_sid takes up a large share of time when creating large
numbers of new inodes (~30-40% in oprofile runs). This patch implements a
cache of 3 entries which is checked before we do a full context_to_sid lookup.
On one system this showed over a x3 improvement in the number of inodes that
could be created per second and around a 20% improvement on another system.
Any time we look up the same context string sucessivly (imagine ls -lZ) we
should hit this cache hot. A cache miss should have a relatively minor affect
on performance next to doing the full table search.
All operations on the cache are done COMPLETELY lockless. We know that all
struct sidtab_node objects created will never be deleted until a new policy is
loaded thus we never have to worry about a pointer being dereferenced. Since
we also know that pointer assignment is atomic we know that the cache will
always have valid pointers. Given this information we implement a FIFO cache
in an array of 3 pointers. Every result (whether a cache hit or table lookup)
will be places in the 0 spot of the cache and the rest of the entries moved
down one spot. The 3rd entry will be lost.
Races are possible and are even likely to happen. Lets assume that 4 tasks
are hitting sidtab_context_to_sid. The first task checks against the first
entry in the cache and it is a miss. Now lets assume a second task updates
the cache with a new entry. This will push the first entry back to the second
spot. Now the first task might check against the second entry (which it
already checked) and will miss again. Now say some third task updates the
cache and push the second entry to the third spot. The first task my check
the third entry (for the third time!) and again have a miss. At which point
it will just do a full table lookup. No big deal!
Signed-off-by: Eric Paris <eparis@redhat.com>
We duplicate functionality in policydb_index_classes() and
policydb_index_others(). This patch merges those functions just to make it
clear there is nothing special happening here.
Signed-off-by: Eric Paris <eparis@redhat.com>
The sym_val_to_name type array can be quite large as it grows linearly with
the number of types. With known policies having over 5k types these
allocations are growing large enough that they are likely to fail. Convert
those to flex_array so no allocation is larger than PAGE_SIZE
Signed-off-by: Eric Paris <eparis@redhat.com>
In rawhide type_val_to_struct will allocate 26848 bytes, an order 3
allocations. While this hasn't been seen to fail it isn't outside the
realm of possibiliy on systems with severe memory fragmentation. Convert
to flex_array so no allocation will ever be bigger than PAGE_SIZE.
Signed-off-by: Eric Paris <eparis@redhat.com>