This patch is the first step in enabling checkpoint/restore of processes
with seccomp enabled.
One of the things CRIU does while dumping tasks is inject code into them
via ptrace to collect information that is only available to the process
itself. However, if we are in a seccomp mode where these processes are
prohibited from making these syscalls, then what CRIU does kills the task.
This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables
a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp
filters to disable (and re-enable) seccomp filters for another task so that
they can be successfully dumped (and restored). We restrict the set of
processes that can disable seccomp through ptrace because although today
ptrace can be used to bypass seccomp, there is some discussion of closing
this loophole in the future and we would like this patch to not depend on
that behavior and be future proofed for when it is removed.
Note that seccomp can be suspended before any filters are actually
installed; this behavior is useful on criu restore, so that we can suspend
seccomp, restore the filters, unmap our restore code from the restored
process' address space, and then resume the task by detaching and have the
filters resumed as well.
v2 changes:
* require that the tracer have no seccomp filters installed
* drop TIF_NOTSC manipulation from the patch
* change from ptrace command to a ptrace option and use this ptrace option
as the flag to check. This means that as soon as the tracer
detaches/dies, seccomp is re-enabled and as a corrollary that one can not
disable seccomp across PTRACE_ATTACHs.
v3 changes:
* get rid of various #ifdefs everywhere
* report more sensible errors when PTRACE_O_SUSPEND_SECCOMP is incorrectly
used
v4 changes:
* get rid of may_suspend_seccomp() in favor of a capable() check in ptrace
directly
v5 changes:
* check that seccomp is not enabled (or suspended) on the tracer
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Will Drewry <wad@chromium.org>
CC: Roland McGrath <roland@hack.frob.com>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
[kees: access seccomp.mode through seccomp_mode() instead]
Signed-off-by: Kees Cook <keescook@chromium.org>
Recently lockless_dereference() was added which can be used in place of
hard-coding smp_read_barrier_depends(). The following PATCH makes the change.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
le64_to_cpu() was applied twice to the physical addresses read from the
control area. This hasn't shown any visible regressions because CRB
driver has been tested only on the little endian platofrms so far.
Reported-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: <stable@vger.kernel.org>
Fixes: 30fc8d138e ("tpm: TPM 2.0 CRB Interface")
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
tpm_ibmvtpm_probe() calls ibmvtpm_reset_crq(ibmvtpm) without having yet
set the virtual device in the ibmvtpm structure. So in ibmvtpm_reset_crq,
the phype call contains empty unit addresses, ibmvtpm->vdev->unit_address.
Signed-off-by: Hon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com>
Signed-off-by: Joy Latten <jmlatten@linux.vnet.ibm.com>
Reviewed-by: Ashley Lai <ashley@ahsleylai.com>
Cc: <stable@vger.kernel.org>
Fixes: 132f762947 ("drivers/char/tpm: Add new device driver to support IBM vTPM")
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
This patch defines a builtin measurement policy "tcb", similar to the
existing "ima_tcb", but with additional rules to also measure files
based on the effective uid and to measure files opened with the "read"
mode bit set (eg. read, read-write).
Changing the builtin "ima_tcb" policy could potentially break existing
users. Instead of defining a new separate boot command line option each
time the builtin measurement policy is modified, this patch defines a
single generic boot command line option "ima_policy=" to specify the
builtin policy and deprecates the use of the builtin ima_tcb policy.
[The "ima_policy=" boot command line option is based on Roberto Sassu's
"ima: added new policy type exec" patch.]
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Dr. Greg Wettstein <gw@idfusion.org>
Cc: stable@vger.kernel.org
The current "mask" policy option matches files opened as MAY_READ,
MAY_WRITE, MAY_APPEND or MAY_EXEC. This patch extends the "mask"
option to match files opened containing one of these modes. For
example, "mask=^MAY_READ" would match files opened read-write.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Dr. Greg Wettstein <gw@idfusion.org>
Cc: stable@vger.kernel.org
The new "euid" policy condition measures files with the specified
effective uid (euid). In addition, for CAP_SETUID files it measures
files with the specified uid or suid.
Changelog:
- fixed checkpatch.pl warnings
- fixed avc denied {setuid} messages - based on Roberto's feedback
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Dr. Greg Wettstein <gw@idfusion.org>
Cc: stable@vger.kernel.org
This patch fixes a bug introduced in "4d7aeee ima: define new template
ima-ng and template fields d-ng and n-ng".
Changelog:
- change int to uint32 (Roberto Sassu's suggestion)
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Roberto Sassu <rsassu@suse.de>
Cc: stable@vger.kernel.org # 3.13
This code used to rely on the fact that kfree(NULL) was a no-op, but
then we changed smk_parse_smack() to return error pointers on failure
instead of NULL. Calling kfree() on an error pointer will oops.
I have re-arranged things a bit so that we only free things if they
have been allocated.
Fixes: e774ad683f ('smack: pass error code through pointers')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Before calling into the filesystem, vfs_setxattr calls
security_inode_setxattr, which ends up calling selinux_inode_setxattr in
our case. That returns -EOPNOTSUPP whenever SBLABEL_MNT is not set.
SBLABEL_MNT was supposed to be set by sb_finish_set_opts, which sets it
only if selinux_is_sblabel_mnt returns true.
The selinux_is_sblabel_mnt logic was broken by eadcabc697 "SELinux: do
all flags twiddling in one place", which didn't take into the account
the SECURITY_FS_USE_NATIVE behavior that had been introduced for nfs
with eb9ae68650 "SELinux: Add new labeling type native labels".
This caused setxattr's of security labels over NFSv4.2 to fail.
Cc: stable@kernel.org # 3.13
Cc: Eric Paris <eparis@redhat.com>
Cc: David Quigley <dpquigl@davequigley.com>
Reported-by: Richard Chan <rc556677@outlook.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: added the stable dependency]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Remove unused permission definitions from SELinux.
Many of these were only ever used in pre-mainline
versions of SELinux, prior to Linux 2.6.0. Some of them
were used in the legacy network or compat_net=1 checks
that were disabled by default in Linux 2.6.18 and
fully removed in Linux 2.6.30.
Permissions never used in mainline Linux:
file swapon
filesystem transition
tcp_socket { connectto newconn acceptfrom }
node enforce_dest
unix_stream_socket { newconn acceptfrom }
Legacy network checks, removed in 2.6.30:
socket { recv_msg send_msg }
node { tcp_recv tcp_send udp_recv udp_send rawip_recv rawip_send dccp_recv dccp_send }
netif { tcp_recv tcp_send udp_recv udp_send rawip_recv rawip_send dccp_recv dccp_send }
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Support per-file labeling of sysfs and pstore files based on
genfscon policy entries. This is safe because the sysfs
and pstore directory tree cannot be manipulated by userspace,
except to unlink pstore entries.
This provides an alternative method of assigning per-file labeling
to sysfs or pstore files without needing to set the labels from
userspace on each boot. The advantages of this approach are that
the labels are assigned as soon as the dentry is first instantiated
and userspace does not need to walk the sysfs or pstore tree and
set the labels on each boot. The limitations of this approach are
that the labels can only be assigned based on pathname prefix matching.
You can initially assign labels using this mechanism and then change
them at runtime via setxattr if allowed to do so by policy.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Suggested-by: Dominick Grift <dac.override@gmail.com>
Acked-by: Jeff Vander Stoep <jeffv@google.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Add support for per-file labeling of debugfs files so that
we can distinguish them in policy. This is particularly
important in Android where certain debugfs files have to be writable
by apps and therefore the debugfs directory tree can be read and
searched by all.
Since debugfs is entirely kernel-generated, the directory tree is
immutable by userspace, and the inodes are pinned in memory, we can
simply use the same approach as with proc and label the inodes from
policy based on pathname from the root of the debugfs filesystem.
Generalize the existing labeling support used for proc and reuse it
for debugfs too.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Update the set of SELinux netlink socket class definitions to match
the set of netlink protocols implemented by the kernel. The
ip_queue implementation for the NETLINK_FIREWALL and NETLINK_IP6_FW protocols
was removed in d16cf20e2f, so we can remove
the corresponding class definitions as this is dead code. Add new
classes for NETLINK_ISCSI, NETLINK_FIB_LOOKUP, NETLINK_CONNECTOR,
NETLINK_NETFILTER, NETLINK_GENERIC, NETLINK_SCSITRANSPORT, NETLINK_RDMA,
and NETLINK_CRYPTO so that we can distinguish among sockets created
for each of these protocols. This change does not define the finer-grained
nlsmsg_read/write permissions or map specific nlmsg_type values to those
permissions in the SELinux nlmsgtab; if finer-grained control of these
sockets is desired/required, that can be added as a follow-on change.
We do not define a SELinux class for NETLINK_ECRYPTFS as the implementation
was removed in 624ae52845.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
selinux_bprm_committed_creds()->__flush_signals() is not right, we
shouldn't clear TIF_SIGPENDING unconditionally. There can be other
reasons for signal_pending(): freezing(), JOBCTL_PENDING_MASK, and
potentially more.
Also change this code to check fatal_signal_pending() rather than
SIGNAL_GROUP_EXIT, it looks a bit better.
Now we can kill __flush_signals() before it finds another buggy user.
Note: this code looks racy, we can flush a signal which was sent after
the task SID has been updated.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
This prints the 'sclass' field as string instead of index in unrecognized netlink message.
The textual representation makes it easier to distinguish the right class.
Signed-off-by: Marek Milkovic <mmilkovi@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: 80-char width fixes]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Smack onlycap allows limiting of CAP_MAC_ADMIN and CAP_MAC_OVERRIDE to
processes running with the configured label. But having single privileged
label is not enough in some real use cases. On a complex system like Tizen,
there maybe few programs that need to configure Smack policy in run-time
and running them all with a single label is not always practical.
This patch extends onlycap feature for multiple labels. They are configured
in the same smackfs "onlycap" interface, separated by spaces.
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
Use proper RCU functions and read locking in smackfs seq_operations.
Smack gets away with not using proper RCU functions in smackfs, because
it never removes entries from these lists. But now one list will be
needed (with interface in smackfs) that will have both elements added and
removed to it.
This change will also help any future changes implementing removal of
unneeded entries from other Smack lists.
The patch also fixes handling of pos argument in smk_seq_start and
smk_seq_next. This fixes a bug in case when smackfs is read with a small
buffer:
Kernel panic - not syncing: Kernel mode fault at addr 0xfa0000011b
CPU: 0 PID: 1292 Comm: dd Not tainted 4.1.0-rc1-00012-g98179b8 #13
Stack:
00000003 0000000d 7ff39e48 7f69fd00
7ff39ce0 601ae4b0 7ff39d50 600e587b
00000010 6039f690 7f69fd40 00612003
Call Trace:
[<601ae4b0>] load2_seq_show+0x19/0x1d
[<600e587b>] seq_read+0x168/0x331
[<600c5943>] __vfs_read+0x21/0x101
[<601a595e>] ? security_file_permission+0xf8/0x105
[<600c5ec6>] ? rw_verify_area+0x86/0xe2
[<600c5fc3>] vfs_read+0xa1/0x14c
[<600c68e2>] SyS_read+0x57/0xa0
[<6001da60>] handle_syscall+0x60/0x80
[<6003087d>] userspace+0x442/0x548
[<6001aa77>] ? interrupt_end+0x0/0x80
[<6001daae>] ? copy_chunk_to_user+0x0/0x2b
[<6002cb6b>] ? save_registers+0x1f/0x39
[<60032ef7>] ? arch_prctl+0xf5/0x170
[<6001a92d>] fork_handler+0x85/0x87
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
This patch adds the iint associated to the current inode as a new
parameter of ima_add_violation(). The passed iint is always not NULL
if a violation is detected. This modification will be used to determine
the inode for which there is a violation.
Since the 'd' and 'd-ng' template field init() functions were detecting
a violation from the value of the iint pointer, they now check the new
field 'violation', added to the 'ima_event_data' structure.
Changelog:
- v1:
- modified an old comment (Roberto Sassu)
Signed-off-by: Roberto Sassu <rsassu@suse.de>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
All event related data has been wrapped into the new 'ima_event_data'
structure. The main benefit of this patch is that a new information
can be made available to template fields initialization functions
by simply adding a new field to the new structure instead of modifying
the definition of those functions.
Changelog:
- v2:
- f_dentry replaced with f_path.dentry (Roberto Sassu)
- removed declaration of temporary variables in template field functions
when possible (suggested by Dmitry Kasatkin)
Signed-off-by: Roberto Sassu <rsassu@suse.de>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch adds validity checks for 'path' parameter and
makes it const.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The call to asymmetric_key_hex_to_key_id() from ca_keys_setup()
silently fails with -ENOMEM. Instead of dynamically allocating
memory from a __setup function, this patch defines a variable
and calls __asymmetric_key_hex_to_key_id(), a new helper function,
directly.
This bug was introduced by 'commit 46963b774d ("KEYS: Overhaul
key identification when searching for asymmetric keys")'.
Changelog:
- for clarification, rename hexlen to asciihexlen in
asymmetric_key_hex_to_key_id()
- add size argument to __asymmetric_key_hex_to_key_id() - David Howells
- inline __asymmetric_key_hex_to_key_id() - David Howells
- remove duplicate strlen() calls
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 3.18
EVM needs to be atomically updated when removing xattrs.
Otherwise concurrent EVM verification may fail in between.
This patch fixes by moving i_mutex unlocking after calling
EVM hook. fsnotify_xattr() is also now called while locked
the same way as it is done in __vfs_setxattr_noperm.
Changelog:
- remove unused 'inode' variable.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
To prevent offline stripping of existing file xattrs and relabeling of
them at runtime, EVM allows only newly created files to be labeled. As
pseudo filesystems are not persistent, stripping of xattrs is not a
concern.
Some LSMs defer file labeling on pseudo filesystems. This patch
permits the labeling of existing files on pseudo files systems.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
File hashes are automatically set and updated and should not be
manually set. This patch limits file hash setting to fix and log
modes.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Include don't appraise or measure rules for the NSFS filesystem
in the builtin ima_tcb and ima_appraise_tcb policies.
Changelog:
- Update documentation
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 3.19
This patch adds a rule in the default measurement policy to skip inodes
in the cgroupfs filesystem. Measurements for this filesystem can be
avoided, as all the digests collected have the same value of the digest of
an empty file.
Furthermore, this patch updates the documentation of IMA policies in
Documentation/ABI/testing/ima_policy to make it consistent with
the policies set in security/integrity/ima/ima_policy.c.
Signed-off-by: Roberto Sassu <rsassu@suse.de>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch makes the following functions to use ERR_PTR() and related
macros to pass the appropriate error code through returned pointers:
smk_parse_smack()
smk_import_entry()
smk_fetch()
It also makes all the other functions that use them to handle the
error cases properly. This ways correct error codes from places
where they happened can be propagated to the user space if necessary.
Doing this it fixes a bug in onlycap and unconfined files
handling. Previously their content was cleared on any error from
smk_import_entry/smk_parse_smack, be it EINVAL (as originally intended)
or ENOMEM. Right now it only reacts on EINVAL passing other codes
properly to userspace.
Comments have been updated accordingly.
Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@samsung.com>
The dmabuf fd can be shared between processes via unix domain
socket. The file of dmabuf fd is came from anon_inode. The inode
has no set and get xattr operations, so it can not be shared
between processes with smack. This patch fixes just to ignore
private inode including anon_inode for smack_file_receive.
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
This patch adds the template 'ima-sig' among choices for the kernel
parameter 'ima_template'.
Signed-off-by: Roberto Sassu <rsassu@suse.de>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
It's a bit easier to read this if we split it up into two for loops.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The stub functions in capability.c are no longer required
with the list based stacking mechanism. Remove the file.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Instead of using a vector of security operations
with explicit, special case stacking of the capability
and yama hooks use lists of hooks with capability and
yama hooks included as appropriate.
The security_operations structure is no longer required.
Instead, there is a union of the function pointers that
allows all the hooks lists to use a common mechanism for
list management while retaining typing. Each module
supplies an array describing the hooks it provides instead
of a sparsely populated security_operations structure.
The description includes the element that gets put on
the hook list, avoiding the issues surrounding individual
element allocation.
The method for registering security modules is changed to
reflect the information available. The method for removing
a module, currently only used by SELinux, has also changed.
It should be generic now, however if there are potential
race conditions based on ordering of hook removal that needs
to be addressed by the calling module.
The security hooks are called from the lists and the first
failure is returned.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Add a list header for each security hook. They aren't used until
later in the patch series. They are grouped together in a structure
so that there doesn't need to be an external address for each.
Macro-ize the initialization of the security_operations
for each security module in anticipation of changing out
the security_operations structure.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Introduce two macros around calling the functions in the
security operations vector. The marco versions here do not
change any behavior.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Remove the large comment describing the content of the
security_operations structure from security.h. This
wasn't done in the previous (2/7) patch because it
would have exceeded the mail list size limits.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Add the large comment describing the content of the
security_operations structure to lsm_hooks.h. This
wasn't done in the previous (1/7) patch because it
would have exceeded the mail list size limits.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
The security.h header file serves two purposes,
interfaces for users of the security modules and
interfaces for security modules. Users of the
security modules don't need to know about what's
in the security_operations structure, so pull it
out into it's own header, lsm_hooks.h
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
SS == 0 results in an invalid usermode state in which SS is apparently
equal to __USER_DS but causes #SS if used.
Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
ensuring that SYSRET never happens with SS set to NULL.
This was exposed by a recent vDSO cleanup.
Fixes: e7d6eefaaa x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull intel drm fixes from Dave Airlie.
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: vlv: fix save/restore of GFX_MAX_REQ_COUNT reg
drm/i915: Workaround to avoid lite restore with HEAD==TAIL
drm/i915: cope with large i2c transfers
Pull intel iommu updates from David Woodhouse:
"This lays a little of the groundwork for upcoming Shared Virtual
Memory support — fixing some bogus #defines for capability bits and
adding the new ones, and starting to use the new wider page tables
where we can, in anticipation of actually filling in the new fields
therein.
It also allows graphics devices to be assigned to VM guests again.
This got broken in 3.17 by disallowing assignment of RMRR-afflicted
devices. Like USB, we do understand why there's an RMRR for graphics
devices — and unlike USB, it's actually sane. So we can make an
exception for graphics devices, just as we do USB controllers.
Finally, tone down the warning about the X2APIC_OPT_OUT bit, due to
persistent requests. X2APIC_OPT_OUT was added to the spec as a nasty
hack to allow broken BIOSes to forbid us from using X2APIC when they
do stupid and invasive things and would break if we did.
Someone noticed that since Windows doesn't have full IOMMU support for
DMA protection, setting the X2APIC_OPT_OUT bit made Windows avoid
initialising the IOMMU on the graphics unit altogether.
This means that it would be available for use in "driver mode", where
the IOMMU registers are made available through a BAR of the graphics
device and the graphics driver can do SVM all for itself.
So they started setting the X2APIC_OPT_OUT bit on *all* platforms with
SVM capabilities. And even the platforms which *might*, if the
planets had been aligned correctly, possibly have had SVM capability
but which in practice actually don't"
* git://git.infradead.org/intel-iommu:
iommu/vt-d: support extended root and context entries
iommu/vt-d: Add new extended capabilities from v2.3 VT-d specification
iommu/vt-d: Allow RMRR on graphics devices too
iommu/vt-d: Print x2apic opt out info instead of printing a warning
iommu/vt-d: kill bogus ecap_niotlb_iunits()
Pull i2c fixes from Wolfram Sang:
"This has a mixture of merge window cleanups and bugfixes"
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: st: add include for pinctrl
i2c: mux: use proper dev when removing "channel-X" symlinks
i2c: digicolor: remove duplicate include
i2c: Mark adapter devices with pm_runtime_no_callbacks
i2c: pca-platform: fix broken email address
i2c: mxs: fix broken email address
i2c: rk3x: report number of messages transmitted
Pull btrfs fixes from Chris Mason:
"Filipe hit two problems in my block group cache patches. We finalized
the fixes last week and ran through more tests"
* 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: prevent list corruption during free space cache processing
Btrfs: fix inode cache writeout
three fixes for i915.
* tag 'drm-intel-next-fixes-2015-04-25' of git://anongit.freedesktop.org/drm-intel:
drm/i915: vlv: fix save/restore of GFX_MAX_REQ_COUNT reg
drm/i915: Workaround to avoid lite restore with HEAD==TAIL
drm/i915: cope with large i2c transfers