License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2005-04-17 06:20:36 +08:00
|
|
|
/*
|
|
|
|
* Copyright (C) 1991, 1992 Linus Torvalds
|
2008-11-25 10:24:11 +08:00
|
|
|
* Copyright (C) 2000, 2001, 2002 Andi Kleen SuSE Labs
|
2005-04-17 06:20:36 +08:00
|
|
|
*
|
|
|
|
* 1997-11-28 Modified for POSIX.1b signals by Richard Henderson
|
|
|
|
* 2000-06-20 Pentium III FXSR, SSE support by Gareth Hughes
|
2008-11-25 10:24:11 +08:00
|
|
|
* 2000-2002 x86-64 support by Andi Kleen
|
2005-04-17 06:20:36 +08:00
|
|
|
*/
|
2012-05-22 10:50:07 +08:00
|
|
|
|
|
|
|
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
|
|
|
|
|
2008-11-22 09:36:41 +08:00
|
|
|
#include <linux/sched.h>
|
2017-02-09 01:51:37 +08:00
|
|
|
#include <linux/sched/task_stack.h>
|
2008-11-22 09:36:41 +08:00
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/smp.h>
|
2005-04-17 06:20:36 +08:00
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/errno.h>
|
|
|
|
#include <linux/wait.h>
|
2008-03-15 08:46:38 +08:00
|
|
|
#include <linux/tracehook.h>
|
2008-11-22 09:36:41 +08:00
|
|
|
#include <linux/unistd.h>
|
|
|
|
#include <linux/stddef.h>
|
|
|
|
#include <linux/personality.h>
|
|
|
|
#include <linux/uaccess.h>
|
2009-09-19 14:40:22 +08:00
|
|
|
#include <linux/user-return-notifier.h>
|
uprobes/core: Handle breakpoint and singlestep exceptions
Uprobes uses exception notifiers to get to know if a thread hit
a breakpoint or a singlestep exception.
When a thread hits a uprobe or is singlestepping post a uprobe
hit, the uprobe exception notifier sets its TIF_UPROBE bit,
which will then be checked on its return to userspace path
(do_notify_resume() ->uprobe_notify_resume()), where the
consumers handlers are run (in task context) based on the
defined filters.
Uprobe hits are thread specific and hence we need to maintain
information about if a task hit a uprobe, what uprobe was hit,
the slot where the original instruction was copied for xol so
that it can be singlestepped with appropriate fixups.
In some cases, special care is needed for instructions that are
executed out of line (xol). These are architecture specific
artefacts, such as handling RIP relative instructions on x86_64.
Since the instruction at which the uprobe was inserted is
executed out of line, architecture specific fixups are added so
that the thread continues normal execution in the presence of a
uprobe.
Postpone the signals until we execute the probed insn.
post_xol() path does a recalc_sigpending() before return to
user-mode, this ensures the signal can't be lost.
Uprobes relies on DIE_DEBUG notification to notify if a
singlestep is complete.
Adds x86 specific uprobe exception notifiers and appropriate
hooks needed to determine a uprobe hit and subsequent post
processing.
Add requisite x86 fixups for xol for uprobes. Specific cases
needing fixups include relative jumps (x86_64), calls, etc.
Where possible, we check and skip singlestepping the
breakpointed instructions. For now we skip single byte as well
as few multibyte nop instructions. However this can be extended
to other instructions too.
Credits to Oleg Nesterov for suggestions/patches related to
signal, breakpoint, singlestep handling code.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120313180011.29771.89027.sendpatchset@srdronam.in.ibm.com
[ Performed various cleanliness edits ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-14 02:00:11 +08:00
|
|
|
#include <linux/uprobes.h>
|
2012-11-28 02:33:25 +08:00
|
|
|
#include <linux/context_tracking.h>
|
2018-03-14 17:41:42 +08:00
|
|
|
#include <linux/syscalls.h>
|
2008-03-06 17:33:08 +08:00
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
#include <asm/processor.h>
|
|
|
|
#include <asm/ucontext.h>
|
2015-04-24 08:54:44 +08:00
|
|
|
#include <asm/fpu/internal.h>
|
2015-04-30 14:45:02 +08:00
|
|
|
#include <asm/fpu/signal.h>
|
2008-01-30 20:30:42 +08:00
|
|
|
#include <asm/vdso.h>
|
x86, mce: use 64bit machine check code on 32bit
The 64bit machine check code is in many ways much better than
the 32bit machine check code: it is more specification compliant,
is cleaner, only has a single code base versus one per CPU,
has better infrastructure for recovery, has a cleaner way to communicate
with user space etc. etc.
Use the 64bit code for 32bit too.
This is the second attempt to do this. There was one a couple of years
ago to unify this code for 32bit and 64bit. Back then this ran into some
trouble with K7s and was reverted.
I believe this time the K7 problems (and some others) are addressed.
I went over the old handlers and was very careful to retain
all quirks.
But of course this needs a lot of testing on old systems. On newer
64bit capable systems I don't expect much problems because they have been
already tested with the 64bit kernel.
I made this a CONFIG for now that still allows to select the old
machine check code. This is mostly to make testing easier,
if someone runs into a problem we can ask them to try
with the CONFIG switched.
The new code is default y for more coverage.
Once there is confidence the 64bit code works well on older hardware
too the CONFIG_X86_OLD_MCE and the associated code can be easily
removed.
This causes a behaviour change for 32bit installations. They now
have to install the mcelog package to be able to log
corrected machine checks.
The 64bit machine check code only handles CPUs which support the
standard Intel machine check architecture described in the IA32 SDM.
The 32bit code has special support for some older CPUs which
have non standard machine check architectures, in particular
WinChip C3 and Intel P5. I made those a separate CONFIG option
and kept them for now. The WinChip variant could be probably
removed without too much pain, it doesn't really do anything
interesting. P5 is also disabled by default (like it
was before) because many motherboards have it miswired, but
according to Alan Cox a few embedded setups use that one.
Forward ported/heavily changed version of old patch, original patch
included review/fixes from Thomas Gleixner, Bert Wesarg.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-04-29 01:07:31 +08:00
|
|
|
#include <asm/mce.h>
|
2012-02-19 23:38:43 +08:00
|
|
|
#include <asm/sighandling.h>
|
2015-07-29 13:41:21 +08:00
|
|
|
#include <asm/vm86.h>
|
2008-11-22 09:36:41 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_X86_64
|
|
|
|
#include <asm/proto.h>
|
|
|
|
#include <asm/ia32_unistd.h>
|
|
|
|
#endif /* CONFIG_X86_64 */
|
|
|
|
|
2008-09-06 07:26:55 +08:00
|
|
|
#include <asm/syscall.h>
|
2008-07-22 00:04:13 +08:00
|
|
|
#include <asm/syscalls.h>
|
2008-03-06 17:33:08 +08:00
|
|
|
|
2008-12-18 10:50:32 +08:00
|
|
|
#include <asm/sigframe.h>
|
2016-09-05 21:33:08 +08:00
|
|
|
#include <asm/signal.h>
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2009-02-09 21:17:40 +08:00
|
|
|
#define COPY(x) do { \
|
|
|
|
get_user_ex(regs->x, &sc->x); \
|
|
|
|
} while (0)
|
2008-11-25 10:21:37 +08:00
|
|
|
|
2009-02-09 21:17:40 +08:00
|
|
|
#define GET_SEG(seg) ({ \
|
|
|
|
unsigned short tmp; \
|
|
|
|
get_user_ex(tmp, &sc->seg); \
|
|
|
|
tmp; \
|
|
|
|
})
|
2008-11-25 10:21:37 +08:00
|
|
|
|
2009-02-09 21:17:40 +08:00
|
|
|
#define COPY_SEG(seg) do { \
|
|
|
|
regs->seg = GET_SEG(seg); \
|
|
|
|
} while (0)
|
2008-11-25 10:21:37 +08:00
|
|
|
|
2009-02-09 21:17:40 +08:00
|
|
|
#define COPY_SEG_CPL3(seg) do { \
|
|
|
|
regs->seg = GET_SEG(seg) | 3; \
|
|
|
|
} while (0)
|
2008-11-25 10:21:37 +08:00
|
|
|
|
2016-02-17 07:09:02 +08:00
|
|
|
#ifdef CONFIG_X86_64
|
|
|
|
/*
|
|
|
|
* If regs->ss will cause an IRET fault, change it. Otherwise leave it
|
|
|
|
* alone. Using this generally makes no sense unless
|
|
|
|
* user_64bit_mode(regs) would return true.
|
|
|
|
*/
|
|
|
|
static void force_valid_ss(struct pt_regs *regs)
|
|
|
|
{
|
|
|
|
u32 ar;
|
|
|
|
asm volatile ("lar %[old_ss], %[ar]\n\t"
|
|
|
|
"jz 1f\n\t" /* If invalid: */
|
|
|
|
"xorl %[ar], %[ar]\n\t" /* set ar = 0 */
|
|
|
|
"1:"
|
|
|
|
: [ar] "=r" (ar)
|
|
|
|
: [old_ss] "rm" ((u16)regs->ss));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For a valid 64-bit user context, we need DPL 3, type
|
|
|
|
* read-write data or read-write exp-down data, and S and P
|
|
|
|
* set. We can't use VERW because VERW doesn't check the
|
|
|
|
* P bit.
|
|
|
|
*/
|
|
|
|
ar &= AR_DPL_MASK | AR_S | AR_P | AR_TYPE_MASK;
|
|
|
|
if (ar != (AR_DPL3 | AR_S | AR_P | AR_TYPE_RWDATA) &&
|
|
|
|
ar != (AR_DPL3 | AR_S | AR_P | AR_TYPE_RWDATA_EXPDOWN))
|
|
|
|
regs->ss = __USER_DS;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
x86/signal/64: Re-add support for SS in the 64-bit signal context
This is a second attempt to make the improvements from c6f2062935c8
("x86/signal/64: Fix SS handling for signals delivered to 64-bit
programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
support for SS in the 64-bit signal context").
This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for
all 64-bit signals (including x32). It indicates that the saved SS
field is valid and that the kernel supports the new behavior.
The goal is to fix a problems with signal handling in 64-bit tasks:
SS wasn't saved in the 64-bit signal context, making it awkward to
determine what SS was at the time of signal delivery and making it
impossible to return to a non-flat SS (as calling sigreturn clobbers
SS).
This also made it extremely difficult for 64-bit tasks to return to
fully-defined 16-bit contexts, because only the kernel can easily do
espfix64, but sigreturn was unable to set a non-flag SS:ESP.
(DOSEMU has a monstrous hack to partially work around this
limitation.)
If we could go back in time, the correct fix would be to make 64-bit
signals work just like 32-bit signals with respect to SS: save it
in signal context, reset it when delivering a signal, and restore
it in sigreturn.
Unfortunately, doing that (as I tried originally) breaks DOSEMU:
DOSEMU wouldn't reset the signal context's SS when clearing the LDT
and changing the saved CS to 64-bit mode, since it predates the SS
context field existing in the first place.
This patch is a bit more complicated, and it tries to balance a
bunch of goals. It makes most cases of changing ucontext->ss during
signal handling work as expected.
I do this by special-casing the interesting case. On sigreturn,
ucontext->ss will be honored by default, unless the ucontext was
created from scratch by an old program and had a 64-bit CS
(unfortunately, CRIU can do this) or was the result of changing a
32-bit signal context to 64-bit without resetting SS (as DOSEMU
does).
For the benefit of new 64-bit software that uses segmentation (new
versions of DOSEMU might), the new behavior can be detected with a
new ucontext flag UC_SIGCONTEXT_SS.
To avoid compilation issues, __pad0 is left as an alias for ss in
ucontext.
The nitty-gritty details are documented in the header file.
This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests,
as the kernel change allows both of them to pass.
Tested-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org
[ Small readability edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 07:09:03 +08:00
|
|
|
static int restore_sigcontext(struct pt_regs *regs,
|
|
|
|
struct sigcontext __user *sc,
|
|
|
|
unsigned long uc_flags)
|
2008-11-25 10:21:37 +08:00
|
|
|
{
|
2015-09-05 15:32:39 +08:00
|
|
|
unsigned long buf_val;
|
2008-11-25 10:21:37 +08:00
|
|
|
void __user *buf;
|
|
|
|
unsigned int tmpflags;
|
|
|
|
unsigned int err = 0;
|
|
|
|
|
|
|
|
/* Always make any pending restarted system calls return -EINTR */
|
2015-02-13 07:01:14 +08:00
|
|
|
current->restart_block.fn = do_no_restart_syscall;
|
2008-11-25 10:21:37 +08:00
|
|
|
|
2009-01-24 07:50:10 +08:00
|
|
|
get_user_try {
|
|
|
|
|
2008-11-25 10:21:37 +08:00
|
|
|
#ifdef CONFIG_X86_32
|
2009-02-09 21:17:40 +08:00
|
|
|
set_user_gs(regs, GET_SEG(gs));
|
2009-01-24 07:50:10 +08:00
|
|
|
COPY_SEG(fs);
|
|
|
|
COPY_SEG(es);
|
|
|
|
COPY_SEG(ds);
|
2008-11-25 10:21:37 +08:00
|
|
|
#endif /* CONFIG_X86_32 */
|
|
|
|
|
2009-01-24 07:50:10 +08:00
|
|
|
COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx);
|
2015-04-04 20:58:23 +08:00
|
|
|
COPY(dx); COPY(cx); COPY(ip); COPY(ax);
|
2008-11-25 10:21:37 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_X86_64
|
2009-01-24 07:50:10 +08:00
|
|
|
COPY(r8);
|
|
|
|
COPY(r9);
|
|
|
|
COPY(r10);
|
|
|
|
COPY(r11);
|
|
|
|
COPY(r12);
|
|
|
|
COPY(r13);
|
|
|
|
COPY(r14);
|
|
|
|
COPY(r15);
|
2008-11-25 10:21:37 +08:00
|
|
|
#endif /* CONFIG_X86_64 */
|
|
|
|
|
2009-01-24 07:50:10 +08:00
|
|
|
COPY_SEG_CPL3(cs);
|
|
|
|
COPY_SEG_CPL3(ss);
|
x86/signal/64: Re-add support for SS in the 64-bit signal context
This is a second attempt to make the improvements from c6f2062935c8
("x86/signal/64: Fix SS handling for signals delivered to 64-bit
programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
support for SS in the 64-bit signal context").
This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for
all 64-bit signals (including x32). It indicates that the saved SS
field is valid and that the kernel supports the new behavior.
The goal is to fix a problems with signal handling in 64-bit tasks:
SS wasn't saved in the 64-bit signal context, making it awkward to
determine what SS was at the time of signal delivery and making it
impossible to return to a non-flat SS (as calling sigreturn clobbers
SS).
This also made it extremely difficult for 64-bit tasks to return to
fully-defined 16-bit contexts, because only the kernel can easily do
espfix64, but sigreturn was unable to set a non-flag SS:ESP.
(DOSEMU has a monstrous hack to partially work around this
limitation.)
If we could go back in time, the correct fix would be to make 64-bit
signals work just like 32-bit signals with respect to SS: save it
in signal context, reset it when delivering a signal, and restore
it in sigreturn.
Unfortunately, doing that (as I tried originally) breaks DOSEMU:
DOSEMU wouldn't reset the signal context's SS when clearing the LDT
and changing the saved CS to 64-bit mode, since it predates the SS
context field existing in the first place.
This patch is a bit more complicated, and it tries to balance a
bunch of goals. It makes most cases of changing ucontext->ss during
signal handling work as expected.
I do this by special-casing the interesting case. On sigreturn,
ucontext->ss will be honored by default, unless the ucontext was
created from scratch by an old program and had a 64-bit CS
(unfortunately, CRIU can do this) or was the result of changing a
32-bit signal context to 64-bit without resetting SS (as DOSEMU
does).
For the benefit of new 64-bit software that uses segmentation (new
versions of DOSEMU might), the new behavior can be detected with a
new ucontext flag UC_SIGCONTEXT_SS.
To avoid compilation issues, __pad0 is left as an alias for ss in
ucontext.
The nitty-gritty details are documented in the header file.
This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests,
as the kernel change allows both of them to pass.
Tested-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org
[ Small readability edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 07:09:03 +08:00
|
|
|
|
2009-01-24 07:50:10 +08:00
|
|
|
get_user_ex(tmpflags, &sc->flags);
|
|
|
|
regs->flags = (regs->flags & ~FIX_EFLAGS) | (tmpflags & FIX_EFLAGS);
|
|
|
|
regs->orig_ax = -1; /* disable syscall checks */
|
|
|
|
|
2015-09-05 15:32:39 +08:00
|
|
|
get_user_ex(buf_val, &sc->fpstate);
|
|
|
|
buf = (void __user *)buf_val;
|
2009-01-24 07:50:10 +08:00
|
|
|
} get_user_catch(err);
|
2008-11-25 10:21:37 +08:00
|
|
|
|
2019-04-03 15:39:48 +08:00
|
|
|
#ifdef CONFIG_X86_64
|
|
|
|
/*
|
|
|
|
* Fix up SS if needed for the benefit of old DOSEMU and
|
|
|
|
* CRIU.
|
|
|
|
*/
|
|
|
|
if (unlikely(!(uc_flags & UC_STRICT_RESTORE_SS) && user_64bit_mode(regs)))
|
|
|
|
force_valid_ss(regs);
|
|
|
|
#endif
|
|
|
|
|
2016-08-04 04:45:50 +08:00
|
|
|
err |= fpu__restore_sig(buf, IS_ENABLED(CONFIG_X86_32));
|
2012-09-22 03:43:15 +08:00
|
|
|
|
2008-11-25 10:21:37 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2012-02-19 23:43:09 +08:00
|
|
|
int setup_sigcontext(struct sigcontext __user *sc, void __user *fpstate,
|
|
|
|
struct pt_regs *regs, unsigned long mask)
|
2008-11-25 10:21:37 +08:00
|
|
|
{
|
|
|
|
int err = 0;
|
|
|
|
|
2009-01-24 07:50:10 +08:00
|
|
|
put_user_try {
|
2008-11-25 10:21:37 +08:00
|
|
|
|
2009-01-24 07:50:10 +08:00
|
|
|
#ifdef CONFIG_X86_32
|
2009-02-09 21:17:40 +08:00
|
|
|
put_user_ex(get_user_gs(regs), (unsigned int __user *)&sc->gs);
|
2009-01-24 07:50:10 +08:00
|
|
|
put_user_ex(regs->fs, (unsigned int __user *)&sc->fs);
|
|
|
|
put_user_ex(regs->es, (unsigned int __user *)&sc->es);
|
|
|
|
put_user_ex(regs->ds, (unsigned int __user *)&sc->ds);
|
2008-11-25 10:21:37 +08:00
|
|
|
#endif /* CONFIG_X86_32 */
|
|
|
|
|
2009-01-24 07:50:10 +08:00
|
|
|
put_user_ex(regs->di, &sc->di);
|
|
|
|
put_user_ex(regs->si, &sc->si);
|
|
|
|
put_user_ex(regs->bp, &sc->bp);
|
|
|
|
put_user_ex(regs->sp, &sc->sp);
|
|
|
|
put_user_ex(regs->bx, &sc->bx);
|
|
|
|
put_user_ex(regs->dx, &sc->dx);
|
|
|
|
put_user_ex(regs->cx, &sc->cx);
|
|
|
|
put_user_ex(regs->ax, &sc->ax);
|
2008-11-25 10:21:37 +08:00
|
|
|
#ifdef CONFIG_X86_64
|
2009-01-24 07:50:10 +08:00
|
|
|
put_user_ex(regs->r8, &sc->r8);
|
|
|
|
put_user_ex(regs->r9, &sc->r9);
|
|
|
|
put_user_ex(regs->r10, &sc->r10);
|
|
|
|
put_user_ex(regs->r11, &sc->r11);
|
|
|
|
put_user_ex(regs->r12, &sc->r12);
|
|
|
|
put_user_ex(regs->r13, &sc->r13);
|
|
|
|
put_user_ex(regs->r14, &sc->r14);
|
|
|
|
put_user_ex(regs->r15, &sc->r15);
|
2008-11-25 10:21:37 +08:00
|
|
|
#endif /* CONFIG_X86_64 */
|
|
|
|
|
2012-03-12 17:25:55 +08:00
|
|
|
put_user_ex(current->thread.trap_nr, &sc->trapno);
|
2009-01-24 07:50:10 +08:00
|
|
|
put_user_ex(current->thread.error_code, &sc->err);
|
|
|
|
put_user_ex(regs->ip, &sc->ip);
|
2008-11-25 10:21:37 +08:00
|
|
|
#ifdef CONFIG_X86_32
|
2009-01-24 07:50:10 +08:00
|
|
|
put_user_ex(regs->cs, (unsigned int __user *)&sc->cs);
|
|
|
|
put_user_ex(regs->flags, &sc->flags);
|
|
|
|
put_user_ex(regs->sp, &sc->sp_at_signal);
|
|
|
|
put_user_ex(regs->ss, (unsigned int __user *)&sc->ss);
|
2008-11-25 10:21:37 +08:00
|
|
|
#else /* !CONFIG_X86_32 */
|
2009-01-24 07:50:10 +08:00
|
|
|
put_user_ex(regs->flags, &sc->flags);
|
|
|
|
put_user_ex(regs->cs, &sc->cs);
|
2015-08-13 23:25:20 +08:00
|
|
|
put_user_ex(0, &sc->gs);
|
|
|
|
put_user_ex(0, &sc->fs);
|
x86/signal/64: Re-add support for SS in the 64-bit signal context
This is a second attempt to make the improvements from c6f2062935c8
("x86/signal/64: Fix SS handling for signals delivered to 64-bit
programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
support for SS in the 64-bit signal context").
This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for
all 64-bit signals (including x32). It indicates that the saved SS
field is valid and that the kernel supports the new behavior.
The goal is to fix a problems with signal handling in 64-bit tasks:
SS wasn't saved in the 64-bit signal context, making it awkward to
determine what SS was at the time of signal delivery and making it
impossible to return to a non-flat SS (as calling sigreturn clobbers
SS).
This also made it extremely difficult for 64-bit tasks to return to
fully-defined 16-bit contexts, because only the kernel can easily do
espfix64, but sigreturn was unable to set a non-flag SS:ESP.
(DOSEMU has a monstrous hack to partially work around this
limitation.)
If we could go back in time, the correct fix would be to make 64-bit
signals work just like 32-bit signals with respect to SS: save it
in signal context, reset it when delivering a signal, and restore
it in sigreturn.
Unfortunately, doing that (as I tried originally) breaks DOSEMU:
DOSEMU wouldn't reset the signal context's SS when clearing the LDT
and changing the saved CS to 64-bit mode, since it predates the SS
context field existing in the first place.
This patch is a bit more complicated, and it tries to balance a
bunch of goals. It makes most cases of changing ucontext->ss during
signal handling work as expected.
I do this by special-casing the interesting case. On sigreturn,
ucontext->ss will be honored by default, unless the ucontext was
created from scratch by an old program and had a 64-bit CS
(unfortunately, CRIU can do this) or was the result of changing a
32-bit signal context to 64-bit without resetting SS (as DOSEMU
does).
For the benefit of new 64-bit software that uses segmentation (new
versions of DOSEMU might), the new behavior can be detected with a
new ucontext flag UC_SIGCONTEXT_SS.
To avoid compilation issues, __pad0 is left as an alias for ss in
ucontext.
The nitty-gritty details are documented in the header file.
This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests,
as the kernel change allows both of them to pass.
Tested-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org
[ Small readability edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 07:09:03 +08:00
|
|
|
put_user_ex(regs->ss, &sc->ss);
|
2008-11-25 10:21:37 +08:00
|
|
|
#endif /* CONFIG_X86_32 */
|
|
|
|
|
2019-03-30 05:46:51 +08:00
|
|
|
put_user_ex(fpstate, (unsigned long __user *)&sc->fpstate);
|
2008-11-25 10:21:37 +08:00
|
|
|
|
2009-01-24 07:50:10 +08:00
|
|
|
/* non-iBCS2 extensions.. */
|
|
|
|
put_user_ex(mask, &sc->oldmask);
|
|
|
|
put_user_ex(current->thread.cr2, &sc->cr2);
|
|
|
|
} put_user_catch(err);
|
2008-11-25 10:21:37 +08:00
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
/*
|
2008-11-25 10:23:12 +08:00
|
|
|
* Set up a signal frame.
|
2005-04-17 06:20:36 +08:00
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Determine which stack to use..
|
|
|
|
*/
|
2009-02-28 02:30:32 +08:00
|
|
|
static unsigned long align_sigframe(unsigned long sp)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_X86_32
|
|
|
|
/*
|
|
|
|
* Align the stack pointer according to the i386 ABI,
|
|
|
|
* i.e. so that on function entry ((sp + 4) & 15) == 0.
|
|
|
|
*/
|
|
|
|
sp = ((sp + 4) & -16ul) - 4;
|
|
|
|
#else /* !CONFIG_X86_32 */
|
|
|
|
sp = round_down(sp, 16) - 8;
|
|
|
|
#endif
|
|
|
|
return sp;
|
|
|
|
}
|
|
|
|
|
2015-09-28 20:23:57 +08:00
|
|
|
static void __user *
|
2008-07-30 01:29:21 +08:00
|
|
|
get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
|
2009-02-28 02:27:04 +08:00
|
|
|
void __user **fpstate)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
/* Default to using normal stack */
|
x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels
Currently for x86 and x86_32 binaries, fpstate in the user sigframe is copied
to/from the fpstate in the task struct.
And in the case of signal delivery for x86_64 binaries, if the fpstate is live
in the CPU registers, then the live state is copied directly to the user
sigframe. Otherwise fpstate in the task struct is copied to the user sigframe.
During restore, fpstate in the user sigframe is restored directly to the live
CPU registers.
Historically, different code paths led to different bugs. For example,
x86_64 code path was not preemption safe till recently. Also there is lot
of code duplication for support of new features like xsave etc.
Unify signal handling code paths for x86 and x86_64 kernels.
New strategy is as follows:
Signal delivery: Both for 32/64-bit frames, align the core math frame area to
64bytes as needed by xsave (this where the main fpu/extended state gets copied
to and excludes the legacy compatibility fsave header for the 32-bit [f]xsave
frames). If the state is live, copy the register state directly to the user
frame. If not live, copy the state in the thread struct to the user frame. And
for 32-bit [f]xsave frames, construct the fsave header separately before
the actual [f]xsave area.
Signal return: As the 32-bit frames with [f]xstate has an additional
'fsave' header, copy everything back from the user sigframe to the
fpstate in the task structure and reconstruct the fxstate from the 'fsave'
header (Also user passed pointers may not be correctly aligned for
any attempt to directly restore any partial state). At the next fpstate usage,
everything will be restored to the live CPU registers.
For all the 64-bit frames and the 32-bit fsave frame, restore the state from
the user sigframe directly to the live CPU registers. 64-bit signals always
restored the math frame directly, so we can expect the math frame pointer
to be correctly aligned. For 32-bit fsave frames, there are no alignment
requirements, so we can restore the state directly.
"lat_sig catch" microbenchmark numbers (for x86, x86_64, x86_32 binaries) are
with in the noise range with this change.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1343171129-2747-4-git-send-email-suresh.b.siddha@intel.com
[ Merged in compilation fix ]
Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-07-25 07:05:29 +08:00
|
|
|
unsigned long math_size = 0;
|
2009-02-28 02:29:57 +08:00
|
|
|
unsigned long sp = regs->sp;
|
x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels
Currently for x86 and x86_32 binaries, fpstate in the user sigframe is copied
to/from the fpstate in the task struct.
And in the case of signal delivery for x86_64 binaries, if the fpstate is live
in the CPU registers, then the live state is copied directly to the user
sigframe. Otherwise fpstate in the task struct is copied to the user sigframe.
During restore, fpstate in the user sigframe is restored directly to the live
CPU registers.
Historically, different code paths led to different bugs. For example,
x86_64 code path was not preemption safe till recently. Also there is lot
of code duplication for support of new features like xsave etc.
Unify signal handling code paths for x86 and x86_64 kernels.
New strategy is as follows:
Signal delivery: Both for 32/64-bit frames, align the core math frame area to
64bytes as needed by xsave (this where the main fpu/extended state gets copied
to and excludes the legacy compatibility fsave header for the 32-bit [f]xsave
frames). If the state is live, copy the register state directly to the user
frame. If not live, copy the state in the thread struct to the user frame. And
for 32-bit [f]xsave frames, construct the fsave header separately before
the actual [f]xsave area.
Signal return: As the 32-bit frames with [f]xstate has an additional
'fsave' header, copy everything back from the user sigframe to the
fpstate in the task structure and reconstruct the fxstate from the 'fsave'
header (Also user passed pointers may not be correctly aligned for
any attempt to directly restore any partial state). At the next fpstate usage,
everything will be restored to the live CPU registers.
For all the 64-bit frames and the 32-bit fsave frame, restore the state from
the user sigframe directly to the live CPU registers. 64-bit signals always
restored the math frame directly, so we can expect the math frame pointer
to be correctly aligned. For 32-bit fsave frames, there are no alignment
requirements, so we can restore the state directly.
"lat_sig catch" microbenchmark numbers (for x86, x86_64, x86_32 binaries) are
with in the noise range with this change.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1343171129-2747-4-git-send-email-suresh.b.siddha@intel.com
[ Merged in compilation fix ]
Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-07-25 07:05:29 +08:00
|
|
|
unsigned long buf_fx = 0;
|
2009-03-20 01:56:29 +08:00
|
|
|
int onsigstack = on_sig_stack(sp);
|
2019-04-04 00:41:36 +08:00
|
|
|
int ret;
|
2009-02-28 02:29:57 +08:00
|
|
|
|
|
|
|
/* redzone */
|
2016-08-04 04:45:50 +08:00
|
|
|
if (IS_ENABLED(CONFIG_X86_64))
|
2012-07-25 07:05:27 +08:00
|
|
|
sp -= 128;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2016-04-15 04:20:02 +08:00
|
|
|
/* This is the X/Open sanctioned signal stack switching. */
|
|
|
|
if (ka->sa.sa_flags & SA_ONSTACK) {
|
|
|
|
if (sas_ss_flags(sp) == 0)
|
|
|
|
sp = current->sas_ss_sp + current->sas_ss_size;
|
2016-08-04 04:45:50 +08:00
|
|
|
} else if (IS_ENABLED(CONFIG_X86_32) &&
|
2016-04-15 04:20:02 +08:00
|
|
|
!onsigstack &&
|
2017-07-28 21:00:32 +08:00
|
|
|
regs->ss != __USER_DS &&
|
2016-04-15 04:20:02 +08:00
|
|
|
!(ka->sa.sa_flags & SA_RESTORER) &&
|
|
|
|
ka->sa.sa_restorer) {
|
|
|
|
/* This is the legacy signal stack switching. */
|
|
|
|
sp = (unsigned long) ka->sa.sa_restorer;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2019-04-04 00:41:36 +08:00
|
|
|
sp = fpu__alloc_mathframe(sp, IS_ENABLED(CONFIG_X86_32),
|
|
|
|
&buf_fx, &math_size);
|
|
|
|
*fpstate = (void __user *)sp;
|
2008-07-30 01:29:21 +08:00
|
|
|
|
2009-03-20 01:56:29 +08:00
|
|
|
sp = align_sigframe(sp - frame_size);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we are on the alternate signal stack and would overflow it, don't.
|
|
|
|
* Return an always-bogus address instead so we will die with SIGSEGV.
|
|
|
|
*/
|
|
|
|
if (onsigstack && !likely(on_sig_stack(sp)))
|
|
|
|
return (void __user *)-1L;
|
|
|
|
|
x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels
Currently for x86 and x86_32 binaries, fpstate in the user sigframe is copied
to/from the fpstate in the task struct.
And in the case of signal delivery for x86_64 binaries, if the fpstate is live
in the CPU registers, then the live state is copied directly to the user
sigframe. Otherwise fpstate in the task struct is copied to the user sigframe.
During restore, fpstate in the user sigframe is restored directly to the live
CPU registers.
Historically, different code paths led to different bugs. For example,
x86_64 code path was not preemption safe till recently. Also there is lot
of code duplication for support of new features like xsave etc.
Unify signal handling code paths for x86 and x86_64 kernels.
New strategy is as follows:
Signal delivery: Both for 32/64-bit frames, align the core math frame area to
64bytes as needed by xsave (this where the main fpu/extended state gets copied
to and excludes the legacy compatibility fsave header for the 32-bit [f]xsave
frames). If the state is live, copy the register state directly to the user
frame. If not live, copy the state in the thread struct to the user frame. And
for 32-bit [f]xsave frames, construct the fsave header separately before
the actual [f]xsave area.
Signal return: As the 32-bit frames with [f]xstate has an additional
'fsave' header, copy everything back from the user sigframe to the
fpstate in the task structure and reconstruct the fxstate from the 'fsave'
header (Also user passed pointers may not be correctly aligned for
any attempt to directly restore any partial state). At the next fpstate usage,
everything will be restored to the live CPU registers.
For all the 64-bit frames and the 32-bit fsave frame, restore the state from
the user sigframe directly to the live CPU registers. 64-bit signals always
restored the math frame directly, so we can expect the math frame pointer
to be correctly aligned. For 32-bit fsave frames, there are no alignment
requirements, so we can restore the state directly.
"lat_sig catch" microbenchmark numbers (for x86, x86_64, x86_32 binaries) are
with in the noise range with this change.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1343171129-2747-4-git-send-email-suresh.b.siddha@intel.com
[ Merged in compilation fix ]
Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-07-25 07:05:29 +08:00
|
|
|
/* save i387 and extended state */
|
2019-04-04 00:41:36 +08:00
|
|
|
ret = copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size);
|
|
|
|
if (ret < 0)
|
2009-03-20 01:56:29 +08:00
|
|
|
return (void __user *)-1L;
|
|
|
|
|
|
|
|
return (void __user *)sp;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2009-02-28 02:29:57 +08:00
|
|
|
#ifdef CONFIG_X86_32
|
|
|
|
static const struct {
|
|
|
|
u16 poplmovl;
|
|
|
|
u32 val;
|
|
|
|
u16 int80;
|
|
|
|
} __attribute__((packed)) retcode = {
|
|
|
|
0xb858, /* popl %eax; movl $..., %eax */
|
|
|
|
__NR_sigreturn,
|
|
|
|
0x80cd, /* int $0x80 */
|
|
|
|
};
|
|
|
|
|
|
|
|
static const struct {
|
|
|
|
u8 movl;
|
|
|
|
u32 val;
|
|
|
|
u16 int80;
|
|
|
|
u8 pad;
|
|
|
|
} __attribute__((packed)) rt_retcode = {
|
|
|
|
0xb8, /* movl $..., %eax */
|
|
|
|
__NR_rt_sigreturn,
|
|
|
|
0x80cd, /* int $0x80 */
|
|
|
|
0
|
|
|
|
};
|
|
|
|
|
2008-03-06 17:33:08 +08:00
|
|
|
static int
|
2012-11-10 12:51:47 +08:00
|
|
|
__setup_frame(int sig, struct ksignal *ksig, sigset_t *set,
|
2008-09-06 07:28:06 +08:00
|
|
|
struct pt_regs *regs)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
struct sigframe __user *frame;
|
2008-03-06 17:33:08 +08:00
|
|
|
void __user *restorer;
|
2005-04-17 06:20:36 +08:00
|
|
|
int err = 0;
|
2008-07-30 01:29:22 +08:00
|
|
|
void __user *fpstate = NULL;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2012-11-10 12:51:47 +08:00
|
|
|
frame = get_sigframe(&ksig->ka, regs, sizeof(*frame), &fpstate);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 10:57:57 +08:00
|
|
|
if (!access_ok(frame, sizeof(*frame)))
|
2008-09-13 08:01:09 +08:00
|
|
|
return -EFAULT;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-09-13 08:02:53 +08:00
|
|
|
if (__put_user(sig, &frame->sig))
|
2008-09-13 08:01:09 +08:00
|
|
|
return -EFAULT;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-09-13 08:02:53 +08:00
|
|
|
if (setup_sigcontext(&frame->sc, fpstate, regs, set->sig[0]))
|
2008-09-13 08:01:09 +08:00
|
|
|
return -EFAULT;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
if (_NSIG_WORDS > 1) {
|
2008-09-13 08:02:53 +08:00
|
|
|
if (__copy_to_user(&frame->extramask, &set->sig[1],
|
|
|
|
sizeof(frame->extramask)))
|
2008-09-13 08:01:09 +08:00
|
|
|
return -EFAULT;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-04-09 16:29:27 +08:00
|
|
|
if (current->mm->context.vdso)
|
x86, vdso: Reimplement vdso.so preparation in build-time C
Currently, vdso.so files are prepared and analyzed by a combination
of objcopy, nm, some linker script tricks, and some simple ELF
parsers in the kernel. Replace all of that with plain C code that
runs at build time.
All five vdso images now generate .c files that are compiled and
linked in to the kernel image.
This should cause only one userspace-visible change: the loaded vDSO
images are stripped more heavily than they used to be. Everything
outside the loadable segment is dropped. In particular, this causes
the section table and section name strings to be missing. This
should be fine: real dynamic loaders don't load or inspect these
tables anyway. The result is roughly equivalent to eu-strip's
--strip-sections option.
The purpose of this change is to enable the vvar and hpet mappings
to be moved to the page following the vDSO load segment. Currently,
it is possible for the section table to extend into the page after
the load segment, so, if we map it, it risks overlapping the vvar or
hpet page. This happens whenever the load segment is just under a
multiple of PAGE_SIZE.
The only real subtlety here is that the old code had a C file with
inline assembler that did 'call VDSO32_vsyscall' and a linker script
that defined 'VDSO32_vsyscall = __kernel_vsyscall'. This most
likely worked by accident: the linker script entry defines a symbol
associated with an address as opposed to an alias for the real
dynamic symbol __kernel_vsyscall. That caused ld to relocate the
reference at link time instead of leaving an interposable dynamic
relocation. Since the VDSO32_vsyscall hack is no longer needed, I
now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it
work. vdso2c will generate an error and abort the build if the
resulting image contains any dynamic relocations, so we won't
silently generate bad vdso images.
(Dynamic relocations are a problem because nothing will even attempt
to relocate the vdso.)
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-06 03:19:34 +08:00
|
|
|
restorer = current->mm->context.vdso +
|
2015-10-06 08:47:56 +08:00
|
|
|
vdso_image_32.sym___kernel_sigreturn;
|
2007-02-13 20:26:26 +08:00
|
|
|
else
|
2008-01-30 20:33:23 +08:00
|
|
|
restorer = &frame->retcode;
|
2012-11-10 12:51:47 +08:00
|
|
|
if (ksig->ka.sa.sa_flags & SA_RESTORER)
|
|
|
|
restorer = ksig->ka.sa.sa_restorer;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
/* Set up to return from userspace. */
|
|
|
|
err |= __put_user(restorer, &frame->pretcode);
|
2008-03-06 17:33:08 +08:00
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
/*
|
2008-03-06 17:33:08 +08:00
|
|
|
* This is popl %eax ; movl $__NR_sigreturn, %eax ; int $0x80
|
2005-04-17 06:20:36 +08:00
|
|
|
*
|
|
|
|
* WE DO NOT USE IT ANY MORE! It's only left here for historical
|
|
|
|
* reasons and because gdb uses it as a signature to notice
|
|
|
|
* signal handler stack frames.
|
|
|
|
*/
|
2008-11-12 11:09:29 +08:00
|
|
|
err |= __put_user(*((u64 *)&retcode), (u64 *)frame->retcode);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
if (err)
|
2008-09-13 08:01:09 +08:00
|
|
|
return -EFAULT;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
/* Set up registers for signal handler */
|
2008-03-06 17:33:08 +08:00
|
|
|
regs->sp = (unsigned long)frame;
|
2012-11-10 12:51:47 +08:00
|
|
|
regs->ip = (unsigned long)ksig->ka.sa.sa_handler;
|
2008-03-06 17:33:08 +08:00
|
|
|
regs->ax = (unsigned long)sig;
|
2008-02-09 04:09:56 +08:00
|
|
|
regs->dx = 0;
|
|
|
|
regs->cx = 0;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-01-30 20:30:56 +08:00
|
|
|
regs->ds = __USER_DS;
|
|
|
|
regs->es = __USER_DS;
|
|
|
|
regs->ss = __USER_DS;
|
|
|
|
regs->cs = __USER_CS;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2006-01-19 09:44:00 +08:00
|
|
|
return 0;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2012-11-10 12:51:47 +08:00
|
|
|
static int __setup_rt_frame(int sig, struct ksignal *ksig,
|
2008-09-06 07:28:06 +08:00
|
|
|
sigset_t *set, struct pt_regs *regs)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
struct rt_sigframe __user *frame;
|
2008-03-06 17:33:08 +08:00
|
|
|
void __user *restorer;
|
2005-04-17 06:20:36 +08:00
|
|
|
int err = 0;
|
2008-07-30 01:29:22 +08:00
|
|
|
void __user *fpstate = NULL;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2012-11-10 12:51:47 +08:00
|
|
|
frame = get_sigframe(&ksig->ka, regs, sizeof(*frame), &fpstate);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 10:57:57 +08:00
|
|
|
if (!access_ok(frame, sizeof(*frame)))
|
2008-09-13 08:01:09 +08:00
|
|
|
return -EFAULT;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2009-01-24 07:50:10 +08:00
|
|
|
put_user_try {
|
|
|
|
put_user_ex(sig, &frame->sig);
|
|
|
|
put_user_ex(&frame->info, &frame->pinfo);
|
|
|
|
put_user_ex(&frame->uc, &frame->puc);
|
|
|
|
|
|
|
|
/* Create the ucontext. */
|
2019-07-12 11:53:56 +08:00
|
|
|
if (static_cpu_has(X86_FEATURE_XSAVE))
|
2009-01-24 07:50:10 +08:00
|
|
|
put_user_ex(UC_FP_XSTATE, &frame->uc.uc_flags);
|
|
|
|
else
|
|
|
|
put_user_ex(0, &frame->uc.uc_flags);
|
|
|
|
put_user_ex(0, &frame->uc.uc_link);
|
2013-09-02 03:35:01 +08:00
|
|
|
save_altstack_ex(&frame->uc.uc_stack, regs->sp);
|
2009-01-24 07:50:10 +08:00
|
|
|
|
|
|
|
/* Set up to return from userspace. */
|
x86, vdso: Reimplement vdso.so preparation in build-time C
Currently, vdso.so files are prepared and analyzed by a combination
of objcopy, nm, some linker script tricks, and some simple ELF
parsers in the kernel. Replace all of that with plain C code that
runs at build time.
All five vdso images now generate .c files that are compiled and
linked in to the kernel image.
This should cause only one userspace-visible change: the loaded vDSO
images are stripped more heavily than they used to be. Everything
outside the loadable segment is dropped. In particular, this causes
the section table and section name strings to be missing. This
should be fine: real dynamic loaders don't load or inspect these
tables anyway. The result is roughly equivalent to eu-strip's
--strip-sections option.
The purpose of this change is to enable the vvar and hpet mappings
to be moved to the page following the vDSO load segment. Currently,
it is possible for the section table to extend into the page after
the load segment, so, if we map it, it risks overlapping the vvar or
hpet page. This happens whenever the load segment is just under a
multiple of PAGE_SIZE.
The only real subtlety here is that the old code had a C file with
inline assembler that did 'call VDSO32_vsyscall' and a linker script
that defined 'VDSO32_vsyscall = __kernel_vsyscall'. This most
likely worked by accident: the linker script entry defines a symbol
associated with an address as opposed to an alias for the real
dynamic symbol __kernel_vsyscall. That caused ld to relocate the
reference at link time instead of leaving an interposable dynamic
relocation. Since the VDSO32_vsyscall hack is no longer needed, I
now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it
work. vdso2c will generate an error and abort the build if the
resulting image contains any dynamic relocations, so we won't
silently generate bad vdso images.
(Dynamic relocations are a problem because nothing will even attempt
to relocate the vdso.)
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-06 03:19:34 +08:00
|
|
|
restorer = current->mm->context.vdso +
|
2015-10-06 08:47:56 +08:00
|
|
|
vdso_image_32.sym___kernel_rt_sigreturn;
|
2012-11-10 12:51:47 +08:00
|
|
|
if (ksig->ka.sa.sa_flags & SA_RESTORER)
|
|
|
|
restorer = ksig->ka.sa.sa_restorer;
|
2009-01-24 07:50:10 +08:00
|
|
|
put_user_ex(restorer, &frame->pretcode);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This is movl $__NR_rt_sigreturn, %ax ; int $0x80
|
|
|
|
*
|
|
|
|
* WE DO NOT USE IT ANY MORE! It's only left here for historical
|
|
|
|
* reasons and because gdb uses it as a signature to notice
|
|
|
|
* signal handler stack frames.
|
|
|
|
*/
|
|
|
|
put_user_ex(*((u64 *)&rt_retcode), (u64 *)frame->retcode);
|
|
|
|
} put_user_catch(err);
|
2012-09-22 08:18:44 +08:00
|
|
|
|
2012-11-10 12:51:47 +08:00
|
|
|
err |= copy_siginfo_to_user(&frame->info, &ksig->info);
|
2012-09-22 03:43:15 +08:00
|
|
|
err |= setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
|
|
|
|
regs, set->sig[0]);
|
|
|
|
err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
if (err)
|
2008-09-13 08:01:09 +08:00
|
|
|
return -EFAULT;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
/* Set up registers for signal handler */
|
2008-03-06 17:33:08 +08:00
|
|
|
regs->sp = (unsigned long)frame;
|
2012-11-10 12:51:47 +08:00
|
|
|
regs->ip = (unsigned long)ksig->ka.sa.sa_handler;
|
2008-09-06 07:28:38 +08:00
|
|
|
regs->ax = (unsigned long)sig;
|
2008-03-06 17:33:08 +08:00
|
|
|
regs->dx = (unsigned long)&frame->info;
|
|
|
|
regs->cx = (unsigned long)&frame->uc;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-01-30 20:30:56 +08:00
|
|
|
regs->ds = __USER_DS;
|
|
|
|
regs->es = __USER_DS;
|
|
|
|
regs->ss = __USER_DS;
|
|
|
|
regs->cs = __USER_CS;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2006-01-19 09:44:00 +08:00
|
|
|
return 0;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
2008-11-25 10:23:12 +08:00
|
|
|
#else /* !CONFIG_X86_32 */
|
x86/signal/64: Re-add support for SS in the 64-bit signal context
This is a second attempt to make the improvements from c6f2062935c8
("x86/signal/64: Fix SS handling for signals delivered to 64-bit
programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
support for SS in the 64-bit signal context").
This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for
all 64-bit signals (including x32). It indicates that the saved SS
field is valid and that the kernel supports the new behavior.
The goal is to fix a problems with signal handling in 64-bit tasks:
SS wasn't saved in the 64-bit signal context, making it awkward to
determine what SS was at the time of signal delivery and making it
impossible to return to a non-flat SS (as calling sigreturn clobbers
SS).
This also made it extremely difficult for 64-bit tasks to return to
fully-defined 16-bit contexts, because only the kernel can easily do
espfix64, but sigreturn was unable to set a non-flag SS:ESP.
(DOSEMU has a monstrous hack to partially work around this
limitation.)
If we could go back in time, the correct fix would be to make 64-bit
signals work just like 32-bit signals with respect to SS: save it
in signal context, reset it when delivering a signal, and restore
it in sigreturn.
Unfortunately, doing that (as I tried originally) breaks DOSEMU:
DOSEMU wouldn't reset the signal context's SS when clearing the LDT
and changing the saved CS to 64-bit mode, since it predates the SS
context field existing in the first place.
This patch is a bit more complicated, and it tries to balance a
bunch of goals. It makes most cases of changing ucontext->ss during
signal handling work as expected.
I do this by special-casing the interesting case. On sigreturn,
ucontext->ss will be honored by default, unless the ucontext was
created from scratch by an old program and had a 64-bit CS
(unfortunately, CRIU can do this) or was the result of changing a
32-bit signal context to 64-bit without resetting SS (as DOSEMU
does).
For the benefit of new 64-bit software that uses segmentation (new
versions of DOSEMU might), the new behavior can be detected with a
new ucontext flag UC_SIGCONTEXT_SS.
To avoid compilation issues, __pad0 is left as an alias for ss in
ucontext.
The nitty-gritty details are documented in the header file.
This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests,
as the kernel change allows both of them to pass.
Tested-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org
[ Small readability edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 07:09:03 +08:00
|
|
|
static unsigned long frame_uc_flags(struct pt_regs *regs)
|
|
|
|
{
|
|
|
|
unsigned long flags;
|
|
|
|
|
2016-04-05 04:25:02 +08:00
|
|
|
if (boot_cpu_has(X86_FEATURE_XSAVE))
|
x86/signal/64: Re-add support for SS in the 64-bit signal context
This is a second attempt to make the improvements from c6f2062935c8
("x86/signal/64: Fix SS handling for signals delivered to 64-bit
programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
support for SS in the 64-bit signal context").
This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for
all 64-bit signals (including x32). It indicates that the saved SS
field is valid and that the kernel supports the new behavior.
The goal is to fix a problems with signal handling in 64-bit tasks:
SS wasn't saved in the 64-bit signal context, making it awkward to
determine what SS was at the time of signal delivery and making it
impossible to return to a non-flat SS (as calling sigreturn clobbers
SS).
This also made it extremely difficult for 64-bit tasks to return to
fully-defined 16-bit contexts, because only the kernel can easily do
espfix64, but sigreturn was unable to set a non-flag SS:ESP.
(DOSEMU has a monstrous hack to partially work around this
limitation.)
If we could go back in time, the correct fix would be to make 64-bit
signals work just like 32-bit signals with respect to SS: save it
in signal context, reset it when delivering a signal, and restore
it in sigreturn.
Unfortunately, doing that (as I tried originally) breaks DOSEMU:
DOSEMU wouldn't reset the signal context's SS when clearing the LDT
and changing the saved CS to 64-bit mode, since it predates the SS
context field existing in the first place.
This patch is a bit more complicated, and it tries to balance a
bunch of goals. It makes most cases of changing ucontext->ss during
signal handling work as expected.
I do this by special-casing the interesting case. On sigreturn,
ucontext->ss will be honored by default, unless the ucontext was
created from scratch by an old program and had a 64-bit CS
(unfortunately, CRIU can do this) or was the result of changing a
32-bit signal context to 64-bit without resetting SS (as DOSEMU
does).
For the benefit of new 64-bit software that uses segmentation (new
versions of DOSEMU might), the new behavior can be detected with a
new ucontext flag UC_SIGCONTEXT_SS.
To avoid compilation issues, __pad0 is left as an alias for ss in
ucontext.
The nitty-gritty details are documented in the header file.
This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests,
as the kernel change allows both of them to pass.
Tested-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org
[ Small readability edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 07:09:03 +08:00
|
|
|
flags = UC_FP_XSTATE | UC_SIGCONTEXT_SS;
|
|
|
|
else
|
|
|
|
flags = UC_SIGCONTEXT_SS;
|
|
|
|
|
|
|
|
if (likely(user_64bit_mode(regs)))
|
|
|
|
flags |= UC_STRICT_RESTORE_SS;
|
|
|
|
|
|
|
|
return flags;
|
|
|
|
}
|
|
|
|
|
2012-11-10 12:51:47 +08:00
|
|
|
static int __setup_rt_frame(int sig, struct ksignal *ksig,
|
2008-11-25 10:23:12 +08:00
|
|
|
sigset_t *set, struct pt_regs *regs)
|
|
|
|
{
|
|
|
|
struct rt_sigframe __user *frame;
|
|
|
|
void __user *fp = NULL;
|
2019-04-03 15:39:48 +08:00
|
|
|
unsigned long uc_flags;
|
2008-11-25 10:23:12 +08:00
|
|
|
int err = 0;
|
|
|
|
|
2012-11-10 12:51:47 +08:00
|
|
|
frame = get_sigframe(&ksig->ka, regs, sizeof(struct rt_sigframe), &fp);
|
2008-11-25 10:23:12 +08:00
|
|
|
|
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 10:57:57 +08:00
|
|
|
if (!access_ok(frame, sizeof(*frame)))
|
2008-11-25 10:23:12 +08:00
|
|
|
return -EFAULT;
|
|
|
|
|
2012-11-10 12:51:47 +08:00
|
|
|
if (ksig->ka.sa.sa_flags & SA_SIGINFO) {
|
|
|
|
if (copy_siginfo_to_user(&frame->info, &ksig->info))
|
2008-11-25 10:23:12 +08:00
|
|
|
return -EFAULT;
|
|
|
|
}
|
|
|
|
|
2019-04-03 15:39:48 +08:00
|
|
|
uc_flags = frame_uc_flags(regs);
|
|
|
|
|
2009-01-24 07:50:10 +08:00
|
|
|
put_user_try {
|
|
|
|
/* Create the ucontext. */
|
2019-04-03 15:39:48 +08:00
|
|
|
put_user_ex(uc_flags, &frame->uc.uc_flags);
|
2009-01-24 07:50:10 +08:00
|
|
|
put_user_ex(0, &frame->uc.uc_link);
|
2013-09-02 03:35:01 +08:00
|
|
|
save_altstack_ex(&frame->uc.uc_stack, regs->sp);
|
2009-01-24 07:50:10 +08:00
|
|
|
|
|
|
|
/* Set up to return from userspace. If provided, use a stub
|
|
|
|
already in userspace. */
|
|
|
|
/* x86-64 should always use SA_RESTORER. */
|
2012-11-10 12:51:47 +08:00
|
|
|
if (ksig->ka.sa.sa_flags & SA_RESTORER) {
|
|
|
|
put_user_ex(ksig->ka.sa.sa_restorer, &frame->pretcode);
|
2009-01-24 07:50:10 +08:00
|
|
|
} else {
|
|
|
|
/* could use a vstub here */
|
|
|
|
err |= -EFAULT;
|
|
|
|
}
|
|
|
|
} put_user_catch(err);
|
2008-11-25 10:23:12 +08:00
|
|
|
|
2012-09-22 03:43:15 +08:00
|
|
|
err |= setup_sigcontext(&frame->uc.uc_mcontext, fp, regs, set->sig[0]);
|
|
|
|
err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
|
|
|
|
|
2008-11-25 10:23:12 +08:00
|
|
|
if (err)
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
/* Set up registers for signal handler */
|
|
|
|
regs->di = sig;
|
|
|
|
/* In case the signal handler was declared without prototypes */
|
|
|
|
regs->ax = 0;
|
|
|
|
|
|
|
|
/* This also works for non SA_SIGINFO handlers because they expect the
|
|
|
|
next argument after the signal number on the stack. */
|
|
|
|
regs->si = (unsigned long)&frame->info;
|
|
|
|
regs->dx = (unsigned long)&frame->uc;
|
2012-11-10 12:51:47 +08:00
|
|
|
regs->ip = (unsigned long) ksig->ka.sa.sa_handler;
|
2008-11-25 10:23:12 +08:00
|
|
|
|
|
|
|
regs->sp = (unsigned long)frame;
|
|
|
|
|
2016-02-17 07:09:02 +08:00
|
|
|
/*
|
|
|
|
* Set up the CS and SS registers to run signal handlers in
|
|
|
|
* 64-bit mode, even if the handler happens to be interrupting
|
|
|
|
* 32-bit or 16-bit code.
|
|
|
|
*
|
|
|
|
* SS is subtle. In 64-bit mode, we don't need any particular
|
|
|
|
* SS descriptor, but we do need SS to be valid. It's possible
|
|
|
|
* that the old SS is entirely bogus -- this can happen if the
|
|
|
|
* signal we're trying to deliver is #GP or #SS caused by a bad
|
|
|
|
* SS value. We also have a compatbility issue here: DOSEMU
|
|
|
|
* relies on the contents of the SS register indicating the
|
|
|
|
* SS value at the time of the signal, even though that code in
|
|
|
|
* DOSEMU predates sigreturn's ability to restore SS. (DOSEMU
|
|
|
|
* avoids relying on sigreturn to restore SS; instead it uses
|
|
|
|
* a trampoline.) So we do our best: if the old SS was valid,
|
|
|
|
* we keep it. Otherwise we replace it.
|
|
|
|
*/
|
2008-11-25 10:23:12 +08:00
|
|
|
regs->cs = __USER_CS;
|
|
|
|
|
2016-02-17 07:09:02 +08:00
|
|
|
if (unlikely(regs->ss != __USER_DS))
|
|
|
|
force_valid_ss(regs);
|
|
|
|
|
2008-11-25 10:23:12 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#endif /* CONFIG_X86_32 */
|
|
|
|
|
2012-11-10 12:51:47 +08:00
|
|
|
static int x32_setup_rt_frame(struct ksignal *ksig,
|
|
|
|
compat_sigset_t *set,
|
2012-07-25 07:05:27 +08:00
|
|
|
struct pt_regs *regs)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_X86_X32_ABI
|
|
|
|
struct rt_sigframe_x32 __user *frame;
|
2019-04-03 15:39:48 +08:00
|
|
|
unsigned long uc_flags;
|
2012-07-25 07:05:27 +08:00
|
|
|
void __user *restorer;
|
|
|
|
int err = 0;
|
|
|
|
void __user *fpstate = NULL;
|
|
|
|
|
2012-11-10 12:51:47 +08:00
|
|
|
frame = get_sigframe(&ksig->ka, regs, sizeof(*frame), &fpstate);
|
2012-07-25 07:05:27 +08:00
|
|
|
|
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 10:57:57 +08:00
|
|
|
if (!access_ok(frame, sizeof(*frame)))
|
2012-07-25 07:05:27 +08:00
|
|
|
return -EFAULT;
|
|
|
|
|
2012-11-10 12:51:47 +08:00
|
|
|
if (ksig->ka.sa.sa_flags & SA_SIGINFO) {
|
2016-09-05 21:33:08 +08:00
|
|
|
if (__copy_siginfo_to_user32(&frame->info, &ksig->info, true))
|
2012-07-25 07:05:27 +08:00
|
|
|
return -EFAULT;
|
|
|
|
}
|
|
|
|
|
2019-04-03 15:39:48 +08:00
|
|
|
uc_flags = frame_uc_flags(regs);
|
|
|
|
|
2012-07-25 07:05:27 +08:00
|
|
|
put_user_try {
|
|
|
|
/* Create the ucontext. */
|
2019-04-03 15:39:48 +08:00
|
|
|
put_user_ex(uc_flags, &frame->uc.uc_flags);
|
2012-07-25 07:05:27 +08:00
|
|
|
put_user_ex(0, &frame->uc.uc_link);
|
2013-09-02 03:35:01 +08:00
|
|
|
compat_save_altstack_ex(&frame->uc.uc_stack, regs->sp);
|
2012-07-25 07:05:27 +08:00
|
|
|
put_user_ex(0, &frame->uc.uc__pad0);
|
|
|
|
|
2012-11-10 12:51:47 +08:00
|
|
|
if (ksig->ka.sa.sa_flags & SA_RESTORER) {
|
|
|
|
restorer = ksig->ka.sa.sa_restorer;
|
2012-07-25 07:05:27 +08:00
|
|
|
} else {
|
|
|
|
/* could use a vstub here */
|
|
|
|
restorer = NULL;
|
|
|
|
err |= -EFAULT;
|
|
|
|
}
|
2019-03-30 05:46:51 +08:00
|
|
|
put_user_ex(restorer, (unsigned long __user *)&frame->pretcode);
|
2012-07-25 07:05:27 +08:00
|
|
|
} put_user_catch(err);
|
|
|
|
|
2012-09-22 08:18:44 +08:00
|
|
|
err |= setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
|
|
|
|
regs, set->sig[0]);
|
|
|
|
err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
|
|
|
|
|
2012-07-25 07:05:27 +08:00
|
|
|
if (err)
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
/* Set up registers for signal handler */
|
|
|
|
regs->sp = (unsigned long) frame;
|
2012-11-10 12:51:47 +08:00
|
|
|
regs->ip = (unsigned long) ksig->ka.sa.sa_handler;
|
2012-07-25 07:05:27 +08:00
|
|
|
|
|
|
|
/* We use the x32 calling convention here... */
|
2012-11-10 12:51:47 +08:00
|
|
|
regs->di = ksig->sig;
|
2012-07-25 07:05:27 +08:00
|
|
|
regs->si = (unsigned long) &frame->info;
|
|
|
|
regs->dx = (unsigned long) &frame->uc;
|
|
|
|
|
|
|
|
loadsegment(ds, __USER_DS);
|
|
|
|
loadsegment(es, __USER_DS);
|
|
|
|
|
|
|
|
regs->cs = __USER_CS;
|
|
|
|
regs->ss = __USER_DS;
|
|
|
|
#endif /* CONFIG_X86_X32_ABI */
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2008-11-25 10:23:12 +08:00
|
|
|
/*
|
|
|
|
* Do a signal return; undo the signal stack.
|
|
|
|
*/
|
2008-11-25 10:24:11 +08:00
|
|
|
#ifdef CONFIG_X86_32
|
2018-03-14 17:41:42 +08:00
|
|
|
SYSCALL_DEFINE0(sigreturn)
|
2008-11-25 10:23:12 +08:00
|
|
|
{
|
2012-11-13 03:32:42 +08:00
|
|
|
struct pt_regs *regs = current_pt_regs();
|
2008-11-25 10:23:12 +08:00
|
|
|
struct sigframe __user *frame;
|
|
|
|
sigset_t set;
|
|
|
|
|
|
|
|
frame = (struct sigframe __user *)(regs->sp - 8);
|
|
|
|
|
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 10:57:57 +08:00
|
|
|
if (!access_ok(frame, sizeof(*frame)))
|
2008-11-25 10:23:12 +08:00
|
|
|
goto badframe;
|
|
|
|
if (__get_user(set.sig[0], &frame->sc.oldmask) || (_NSIG_WORDS > 1
|
|
|
|
&& __copy_from_user(&set.sig[1], &frame->extramask,
|
|
|
|
sizeof(frame->extramask))))
|
|
|
|
goto badframe;
|
|
|
|
|
2011-07-11 03:27:27 +08:00
|
|
|
set_current_blocked(&set);
|
2008-11-25 10:23:12 +08:00
|
|
|
|
x86/signal/64: Re-add support for SS in the 64-bit signal context
This is a second attempt to make the improvements from c6f2062935c8
("x86/signal/64: Fix SS handling for signals delivered to 64-bit
programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
support for SS in the 64-bit signal context").
This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for
all 64-bit signals (including x32). It indicates that the saved SS
field is valid and that the kernel supports the new behavior.
The goal is to fix a problems with signal handling in 64-bit tasks:
SS wasn't saved in the 64-bit signal context, making it awkward to
determine what SS was at the time of signal delivery and making it
impossible to return to a non-flat SS (as calling sigreturn clobbers
SS).
This also made it extremely difficult for 64-bit tasks to return to
fully-defined 16-bit contexts, because only the kernel can easily do
espfix64, but sigreturn was unable to set a non-flag SS:ESP.
(DOSEMU has a monstrous hack to partially work around this
limitation.)
If we could go back in time, the correct fix would be to make 64-bit
signals work just like 32-bit signals with respect to SS: save it
in signal context, reset it when delivering a signal, and restore
it in sigreturn.
Unfortunately, doing that (as I tried originally) breaks DOSEMU:
DOSEMU wouldn't reset the signal context's SS when clearing the LDT
and changing the saved CS to 64-bit mode, since it predates the SS
context field existing in the first place.
This patch is a bit more complicated, and it tries to balance a
bunch of goals. It makes most cases of changing ucontext->ss during
signal handling work as expected.
I do this by special-casing the interesting case. On sigreturn,
ucontext->ss will be honored by default, unless the ucontext was
created from scratch by an old program and had a 64-bit CS
(unfortunately, CRIU can do this) or was the result of changing a
32-bit signal context to 64-bit without resetting SS (as DOSEMU
does).
For the benefit of new 64-bit software that uses segmentation (new
versions of DOSEMU might), the new behavior can be detected with a
new ucontext flag UC_SIGCONTEXT_SS.
To avoid compilation issues, __pad0 is left as an alias for ss in
ucontext.
The nitty-gritty details are documented in the header file.
This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests,
as the kernel change allows both of them to pass.
Tested-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org
[ Small readability edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 07:09:03 +08:00
|
|
|
/*
|
|
|
|
* x86_32 has no uc_flags bits relevant to restore_sigcontext.
|
|
|
|
* Save a few cycles by skipping the __get_user.
|
|
|
|
*/
|
|
|
|
if (restore_sigcontext(regs, &frame->sc, 0))
|
2008-11-25 10:23:12 +08:00
|
|
|
goto badframe;
|
2015-04-04 20:58:23 +08:00
|
|
|
return regs->ax;
|
2008-11-25 10:23:12 +08:00
|
|
|
|
|
|
|
badframe:
|
2008-12-17 06:02:16 +08:00
|
|
|
signal_fault(regs, frame, "sigreturn");
|
2008-11-25 10:23:12 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2008-11-25 10:24:11 +08:00
|
|
|
#endif /* CONFIG_X86_32 */
|
2008-11-25 10:23:12 +08:00
|
|
|
|
2018-03-14 17:41:42 +08:00
|
|
|
SYSCALL_DEFINE0(rt_sigreturn)
|
2008-11-25 10:23:12 +08:00
|
|
|
{
|
2012-11-13 03:32:42 +08:00
|
|
|
struct pt_regs *regs = current_pt_regs();
|
2008-11-25 10:23:12 +08:00
|
|
|
struct rt_sigframe __user *frame;
|
|
|
|
sigset_t set;
|
x86/signal/64: Re-add support for SS in the 64-bit signal context
This is a second attempt to make the improvements from c6f2062935c8
("x86/signal/64: Fix SS handling for signals delivered to 64-bit
programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
support for SS in the 64-bit signal context").
This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for
all 64-bit signals (including x32). It indicates that the saved SS
field is valid and that the kernel supports the new behavior.
The goal is to fix a problems with signal handling in 64-bit tasks:
SS wasn't saved in the 64-bit signal context, making it awkward to
determine what SS was at the time of signal delivery and making it
impossible to return to a non-flat SS (as calling sigreturn clobbers
SS).
This also made it extremely difficult for 64-bit tasks to return to
fully-defined 16-bit contexts, because only the kernel can easily do
espfix64, but sigreturn was unable to set a non-flag SS:ESP.
(DOSEMU has a monstrous hack to partially work around this
limitation.)
If we could go back in time, the correct fix would be to make 64-bit
signals work just like 32-bit signals with respect to SS: save it
in signal context, reset it when delivering a signal, and restore
it in sigreturn.
Unfortunately, doing that (as I tried originally) breaks DOSEMU:
DOSEMU wouldn't reset the signal context's SS when clearing the LDT
and changing the saved CS to 64-bit mode, since it predates the SS
context field existing in the first place.
This patch is a bit more complicated, and it tries to balance a
bunch of goals. It makes most cases of changing ucontext->ss during
signal handling work as expected.
I do this by special-casing the interesting case. On sigreturn,
ucontext->ss will be honored by default, unless the ucontext was
created from scratch by an old program and had a 64-bit CS
(unfortunately, CRIU can do this) or was the result of changing a
32-bit signal context to 64-bit without resetting SS (as DOSEMU
does).
For the benefit of new 64-bit software that uses segmentation (new
versions of DOSEMU might), the new behavior can be detected with a
new ucontext flag UC_SIGCONTEXT_SS.
To avoid compilation issues, __pad0 is left as an alias for ss in
ucontext.
The nitty-gritty details are documented in the header file.
This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests,
as the kernel change allows both of them to pass.
Tested-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org
[ Small readability edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 07:09:03 +08:00
|
|
|
unsigned long uc_flags;
|
2008-11-25 10:23:12 +08:00
|
|
|
|
|
|
|
frame = (struct rt_sigframe __user *)(regs->sp - sizeof(long));
|
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 10:57:57 +08:00
|
|
|
if (!access_ok(frame, sizeof(*frame)))
|
2008-11-25 10:23:12 +08:00
|
|
|
goto badframe;
|
|
|
|
if (__copy_from_user(&set, &frame->uc.uc_sigmask, sizeof(set)))
|
|
|
|
goto badframe;
|
x86/signal/64: Re-add support for SS in the 64-bit signal context
This is a second attempt to make the improvements from c6f2062935c8
("x86/signal/64: Fix SS handling for signals delivered to 64-bit
programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
support for SS in the 64-bit signal context").
This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for
all 64-bit signals (including x32). It indicates that the saved SS
field is valid and that the kernel supports the new behavior.
The goal is to fix a problems with signal handling in 64-bit tasks:
SS wasn't saved in the 64-bit signal context, making it awkward to
determine what SS was at the time of signal delivery and making it
impossible to return to a non-flat SS (as calling sigreturn clobbers
SS).
This also made it extremely difficult for 64-bit tasks to return to
fully-defined 16-bit contexts, because only the kernel can easily do
espfix64, but sigreturn was unable to set a non-flag SS:ESP.
(DOSEMU has a monstrous hack to partially work around this
limitation.)
If we could go back in time, the correct fix would be to make 64-bit
signals work just like 32-bit signals with respect to SS: save it
in signal context, reset it when delivering a signal, and restore
it in sigreturn.
Unfortunately, doing that (as I tried originally) breaks DOSEMU:
DOSEMU wouldn't reset the signal context's SS when clearing the LDT
and changing the saved CS to 64-bit mode, since it predates the SS
context field existing in the first place.
This patch is a bit more complicated, and it tries to balance a
bunch of goals. It makes most cases of changing ucontext->ss during
signal handling work as expected.
I do this by special-casing the interesting case. On sigreturn,
ucontext->ss will be honored by default, unless the ucontext was
created from scratch by an old program and had a 64-bit CS
(unfortunately, CRIU can do this) or was the result of changing a
32-bit signal context to 64-bit without resetting SS (as DOSEMU
does).
For the benefit of new 64-bit software that uses segmentation (new
versions of DOSEMU might), the new behavior can be detected with a
new ucontext flag UC_SIGCONTEXT_SS.
To avoid compilation issues, __pad0 is left as an alias for ss in
ucontext.
The nitty-gritty details are documented in the header file.
This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests,
as the kernel change allows both of them to pass.
Tested-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org
[ Small readability edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 07:09:03 +08:00
|
|
|
if (__get_user(uc_flags, &frame->uc.uc_flags))
|
|
|
|
goto badframe;
|
2008-11-25 10:23:12 +08:00
|
|
|
|
2011-04-28 03:09:39 +08:00
|
|
|
set_current_blocked(&set);
|
2008-11-25 10:23:12 +08:00
|
|
|
|
x86/signal/64: Re-add support for SS in the 64-bit signal context
This is a second attempt to make the improvements from c6f2062935c8
("x86/signal/64: Fix SS handling for signals delivered to 64-bit
programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
support for SS in the 64-bit signal context").
This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for
all 64-bit signals (including x32). It indicates that the saved SS
field is valid and that the kernel supports the new behavior.
The goal is to fix a problems with signal handling in 64-bit tasks:
SS wasn't saved in the 64-bit signal context, making it awkward to
determine what SS was at the time of signal delivery and making it
impossible to return to a non-flat SS (as calling sigreturn clobbers
SS).
This also made it extremely difficult for 64-bit tasks to return to
fully-defined 16-bit contexts, because only the kernel can easily do
espfix64, but sigreturn was unable to set a non-flag SS:ESP.
(DOSEMU has a monstrous hack to partially work around this
limitation.)
If we could go back in time, the correct fix would be to make 64-bit
signals work just like 32-bit signals with respect to SS: save it
in signal context, reset it when delivering a signal, and restore
it in sigreturn.
Unfortunately, doing that (as I tried originally) breaks DOSEMU:
DOSEMU wouldn't reset the signal context's SS when clearing the LDT
and changing the saved CS to 64-bit mode, since it predates the SS
context field existing in the first place.
This patch is a bit more complicated, and it tries to balance a
bunch of goals. It makes most cases of changing ucontext->ss during
signal handling work as expected.
I do this by special-casing the interesting case. On sigreturn,
ucontext->ss will be honored by default, unless the ucontext was
created from scratch by an old program and had a 64-bit CS
(unfortunately, CRIU can do this) or was the result of changing a
32-bit signal context to 64-bit without resetting SS (as DOSEMU
does).
For the benefit of new 64-bit software that uses segmentation (new
versions of DOSEMU might), the new behavior can be detected with a
new ucontext flag UC_SIGCONTEXT_SS.
To avoid compilation issues, __pad0 is left as an alias for ss in
ucontext.
The nitty-gritty details are documented in the header file.
This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests,
as the kernel change allows both of them to pass.
Tested-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org
[ Small readability edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 07:09:03 +08:00
|
|
|
if (restore_sigcontext(regs, &frame->uc.uc_mcontext, uc_flags))
|
2008-11-25 10:23:12 +08:00
|
|
|
goto badframe;
|
|
|
|
|
2012-11-21 03:24:26 +08:00
|
|
|
if (restore_altstack(&frame->uc.uc_stack))
|
2008-11-25 10:23:12 +08:00
|
|
|
goto badframe;
|
|
|
|
|
2015-04-04 20:58:23 +08:00
|
|
|
return regs->ax;
|
2008-11-25 10:23:12 +08:00
|
|
|
|
|
|
|
badframe:
|
|
|
|
signal_fault(regs, frame, "rt_sigreturn");
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-09-05 21:33:08 +08:00
|
|
|
static inline int is_ia32_compat_frame(struct ksignal *ksig)
|
2015-04-30 13:26:04 +08:00
|
|
|
{
|
2016-08-04 04:45:50 +08:00
|
|
|
return IS_ENABLED(CONFIG_IA32_EMULATION) &&
|
2016-09-05 21:33:08 +08:00
|
|
|
ksig->ka.sa.sa_flags & SA_IA32_ABI;
|
2015-04-30 13:26:04 +08:00
|
|
|
}
|
|
|
|
|
2016-09-05 21:33:08 +08:00
|
|
|
static inline int is_ia32_frame(struct ksignal *ksig)
|
2015-04-30 13:26:04 +08:00
|
|
|
{
|
2016-09-05 21:33:08 +08:00
|
|
|
return IS_ENABLED(CONFIG_X86_32) || is_ia32_compat_frame(ksig);
|
2015-04-30 13:26:04 +08:00
|
|
|
}
|
|
|
|
|
2016-09-05 21:33:08 +08:00
|
|
|
static inline int is_x32_frame(struct ksignal *ksig)
|
2015-04-30 13:26:04 +08:00
|
|
|
{
|
2016-09-05 21:33:08 +08:00
|
|
|
return IS_ENABLED(CONFIG_X86_X32_ABI) &&
|
|
|
|
ksig->ka.sa.sa_flags & SA_X32_ABI;
|
2015-04-30 13:26:04 +08:00
|
|
|
}
|
|
|
|
|
2008-09-06 07:28:06 +08:00
|
|
|
static int
|
2012-11-10 12:51:47 +08:00
|
|
|
setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
|
2008-09-06 07:28:06 +08:00
|
|
|
{
|
2014-07-13 23:43:51 +08:00
|
|
|
int usig = ksig->sig;
|
2012-05-02 21:59:21 +08:00
|
|
|
sigset_t *set = sigmask_to_save();
|
2012-07-25 07:05:27 +08:00
|
|
|
compat_sigset_t *cset = (compat_sigset_t *) set;
|
2008-09-06 07:28:06 +08:00
|
|
|
|
2019-03-06 03:47:53 +08:00
|
|
|
/* Perform fixup for the pre-signal frame. */
|
2018-06-22 18:45:07 +08:00
|
|
|
rseq_signal_deliver(ksig, regs);
|
2018-06-02 20:43:58 +08:00
|
|
|
|
2008-09-06 07:28:06 +08:00
|
|
|
/* Set up the stack frame */
|
2016-09-05 21:33:08 +08:00
|
|
|
if (is_ia32_frame(ksig)) {
|
2012-11-10 12:51:47 +08:00
|
|
|
if (ksig->ka.sa.sa_flags & SA_SIGINFO)
|
|
|
|
return ia32_setup_rt_frame(usig, ksig, cset, regs);
|
2008-09-25 10:13:11 +08:00
|
|
|
else
|
2012-11-10 12:51:47 +08:00
|
|
|
return ia32_setup_frame(usig, ksig, cset, regs);
|
2016-09-05 21:33:08 +08:00
|
|
|
} else if (is_x32_frame(ksig)) {
|
2012-11-10 12:51:47 +08:00
|
|
|
return x32_setup_rt_frame(ksig, cset, regs);
|
2012-02-20 01:41:09 +08:00
|
|
|
} else {
|
2012-11-10 12:51:47 +08:00
|
|
|
return __setup_rt_frame(ksig->sig, ksig, set, regs);
|
2012-02-20 01:41:09 +08:00
|
|
|
}
|
2008-09-06 07:28:06 +08:00
|
|
|
}
|
|
|
|
|
2012-05-22 11:42:15 +08:00
|
|
|
static void
|
2012-11-10 12:51:47 +08:00
|
|
|
handle_signal(struct ksignal *ksig, struct pt_regs *regs)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
When the TIF_SINGLESTEP tracee dequeues a signal,
handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but
leaves TIF_SINGLESTEP set.
If the tracer does PTRACE_SINGLESTEP again, enable_single_step()
sets X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the
subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the
tracee gets the wrong SIGTRAP.
Test-case (needs -O2 to avoid prologue insns in signal handler):
#include <unistd.h>
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <assert.h>
#include <stddef.h>
void handler(int n)
{
asm("nop");
}
int child(void)
{
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
signal(SIGALRM, handler);
kill(getpid(), SIGALRM);
return 0x23;
}
void *getip(int pid)
{
return (void*)ptrace(PTRACE_PEEKUSER, pid,
offsetof(struct user, regs.rip), 0);
}
int main(void)
{
int pid, status;
pid = fork();
if (!pid)
return child();
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 0);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 1);
assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
assert(wait(&status) == pid);
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
return 0;
}
The last assert() fails because PTRACE_CONT wrongly triggers
another single-step and X86_EFLAGS_TF can't be cleared by
debugger until the tracee does sys_rt_sigreturn().
Change handle_signal() to do user_disable_single_step() if
stepping, we do not need to preserve TIF_SINGLESTEP because we
are going to do ptrace_notify(), and it is simply wrong to leak
this bit.
While at it, change the comment to explain why we also need to
clear TF unconditionally after setup_rt_frame().
Note: in the longer term we should probably change
setup_sigcontext() to use get_flags() and then just remove this
user_disable_single_step(). And, the state of TIF_FORCED_TF can
be wrong after restore_sigcontext() which can set/clear TF, this
needs another fix.
This fix fixes the 'single_step_syscall_32' testcase in
the x86 testsuite:
Before:
~/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[WARN] Hit 10000 SIGTRAPs with si_addr 0xf7789cc0, ip 0xf7789cc0
Trace/breakpoint trap (core dumped)
After:
~/linux/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[OK] Survived with TF set and 39 traps
[RUN] Fast syscall with TF cleared
[OK] Nothing unexpected happened
Reported-by: Evan Teran <eteran@alum.rit.edu>
Reported-by: Pedro Alves <palves@redhat.com>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Added x86 self-test info. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-16 15:40:25 +08:00
|
|
|
bool stepping, failed;
|
2015-04-23 18:49:20 +08:00
|
|
|
struct fpu *fpu = ¤t->thread.fpu;
|
x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
When the TIF_SINGLESTEP tracee dequeues a signal,
handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but
leaves TIF_SINGLESTEP set.
If the tracer does PTRACE_SINGLESTEP again, enable_single_step()
sets X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the
subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the
tracee gets the wrong SIGTRAP.
Test-case (needs -O2 to avoid prologue insns in signal handler):
#include <unistd.h>
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <assert.h>
#include <stddef.h>
void handler(int n)
{
asm("nop");
}
int child(void)
{
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
signal(SIGALRM, handler);
kill(getpid(), SIGALRM);
return 0x23;
}
void *getip(int pid)
{
return (void*)ptrace(PTRACE_PEEKUSER, pid,
offsetof(struct user, regs.rip), 0);
}
int main(void)
{
int pid, status;
pid = fork();
if (!pid)
return child();
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 0);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 1);
assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
assert(wait(&status) == pid);
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
return 0;
}
The last assert() fails because PTRACE_CONT wrongly triggers
another single-step and X86_EFLAGS_TF can't be cleared by
debugger until the tracee does sys_rt_sigreturn().
Change handle_signal() to do user_disable_single_step() if
stepping, we do not need to preserve TIF_SINGLESTEP because we
are going to do ptrace_notify(), and it is simply wrong to leak
this bit.
While at it, change the comment to explain why we also need to
clear TF unconditionally after setup_rt_frame().
Note: in the longer term we should probably change
setup_sigcontext() to use get_flags() and then just remove this
user_disable_single_step(). And, the state of TIF_FORCED_TF can
be wrong after restore_sigcontext() which can set/clear TF, this
needs another fix.
This fix fixes the 'single_step_syscall_32' testcase in
the x86 testsuite:
Before:
~/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[WARN] Hit 10000 SIGTRAPs with si_addr 0xf7789cc0, ip 0xf7789cc0
Trace/breakpoint trap (core dumped)
After:
~/linux/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[OK] Survived with TF set and 39 traps
[RUN] Fast syscall with TF cleared
[OK] Nothing unexpected happened
Reported-by: Evan Teran <eteran@alum.rit.edu>
Reported-by: Pedro Alves <palves@redhat.com>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Added x86 self-test info. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-16 15:40:25 +08:00
|
|
|
|
2015-07-29 13:41:19 +08:00
|
|
|
if (v8086_mode(regs))
|
|
|
|
save_v86_state((struct kernel_vm86_regs *) regs, VM86_SIGNAL);
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
/* Are we from a system call? */
|
2008-09-06 07:26:55 +08:00
|
|
|
if (syscall_get_nr(current, regs) >= 0) {
|
2005-04-17 06:20:36 +08:00
|
|
|
/* If so, check system call restarting.. */
|
2008-09-06 07:26:55 +08:00
|
|
|
switch (syscall_get_error(current, regs)) {
|
2008-02-09 04:09:58 +08:00
|
|
|
case -ERESTART_RESTARTBLOCK:
|
|
|
|
case -ERESTARTNOHAND:
|
|
|
|
regs->ax = -EINTR;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case -ERESTARTSYS:
|
2012-11-10 12:51:47 +08:00
|
|
|
if (!(ksig->ka.sa.sa_flags & SA_RESTART)) {
|
2008-01-30 20:30:56 +08:00
|
|
|
regs->ax = -EINTR;
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
2008-02-09 04:09:58 +08:00
|
|
|
}
|
|
|
|
/* fallthrough */
|
|
|
|
case -ERESTARTNOINTR:
|
|
|
|
regs->ax = regs->orig_ax;
|
|
|
|
regs->ip -= 2;
|
|
|
|
break;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
When the TIF_SINGLESTEP tracee dequeues a signal,
handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but
leaves TIF_SINGLESTEP set.
If the tracer does PTRACE_SINGLESTEP again, enable_single_step()
sets X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the
subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the
tracee gets the wrong SIGTRAP.
Test-case (needs -O2 to avoid prologue insns in signal handler):
#include <unistd.h>
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <assert.h>
#include <stddef.h>
void handler(int n)
{
asm("nop");
}
int child(void)
{
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
signal(SIGALRM, handler);
kill(getpid(), SIGALRM);
return 0x23;
}
void *getip(int pid)
{
return (void*)ptrace(PTRACE_PEEKUSER, pid,
offsetof(struct user, regs.rip), 0);
}
int main(void)
{
int pid, status;
pid = fork();
if (!pid)
return child();
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 0);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 1);
assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
assert(wait(&status) == pid);
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
return 0;
}
The last assert() fails because PTRACE_CONT wrongly triggers
another single-step and X86_EFLAGS_TF can't be cleared by
debugger until the tracee does sys_rt_sigreturn().
Change handle_signal() to do user_disable_single_step() if
stepping, we do not need to preserve TIF_SINGLESTEP because we
are going to do ptrace_notify(), and it is simply wrong to leak
this bit.
While at it, change the comment to explain why we also need to
clear TF unconditionally after setup_rt_frame().
Note: in the longer term we should probably change
setup_sigcontext() to use get_flags() and then just remove this
user_disable_single_step(). And, the state of TIF_FORCED_TF can
be wrong after restore_sigcontext() which can set/clear TF, this
needs another fix.
This fix fixes the 'single_step_syscall_32' testcase in
the x86 testsuite:
Before:
~/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[WARN] Hit 10000 SIGTRAPs with si_addr 0xf7789cc0, ip 0xf7789cc0
Trace/breakpoint trap (core dumped)
After:
~/linux/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[OK] Survived with TF set and 39 traps
[RUN] Fast syscall with TF cleared
[OK] Nothing unexpected happened
Reported-by: Evan Teran <eteran@alum.rit.edu>
Reported-by: Pedro Alves <palves@redhat.com>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Added x86 self-test info. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-16 15:40:25 +08:00
|
|
|
* If TF is set due to a debugger (TIF_FORCED_TF), clear TF now
|
|
|
|
* so that register information in the sigcontext is correct and
|
|
|
|
* then notify the tracer before entering the signal handler.
|
2005-04-17 06:20:36 +08:00
|
|
|
*/
|
x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
When the TIF_SINGLESTEP tracee dequeues a signal,
handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but
leaves TIF_SINGLESTEP set.
If the tracer does PTRACE_SINGLESTEP again, enable_single_step()
sets X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the
subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the
tracee gets the wrong SIGTRAP.
Test-case (needs -O2 to avoid prologue insns in signal handler):
#include <unistd.h>
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <assert.h>
#include <stddef.h>
void handler(int n)
{
asm("nop");
}
int child(void)
{
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
signal(SIGALRM, handler);
kill(getpid(), SIGALRM);
return 0x23;
}
void *getip(int pid)
{
return (void*)ptrace(PTRACE_PEEKUSER, pid,
offsetof(struct user, regs.rip), 0);
}
int main(void)
{
int pid, status;
pid = fork();
if (!pid)
return child();
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 0);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 1);
assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
assert(wait(&status) == pid);
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
return 0;
}
The last assert() fails because PTRACE_CONT wrongly triggers
another single-step and X86_EFLAGS_TF can't be cleared by
debugger until the tracee does sys_rt_sigreturn().
Change handle_signal() to do user_disable_single_step() if
stepping, we do not need to preserve TIF_SINGLESTEP because we
are going to do ptrace_notify(), and it is simply wrong to leak
this bit.
While at it, change the comment to explain why we also need to
clear TF unconditionally after setup_rt_frame().
Note: in the longer term we should probably change
setup_sigcontext() to use get_flags() and then just remove this
user_disable_single_step(). And, the state of TIF_FORCED_TF can
be wrong after restore_sigcontext() which can set/clear TF, this
needs another fix.
This fix fixes the 'single_step_syscall_32' testcase in
the x86 testsuite:
Before:
~/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[WARN] Hit 10000 SIGTRAPs with si_addr 0xf7789cc0, ip 0xf7789cc0
Trace/breakpoint trap (core dumped)
After:
~/linux/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[OK] Survived with TF set and 39 traps
[RUN] Fast syscall with TF cleared
[OK] Nothing unexpected happened
Reported-by: Evan Teran <eteran@alum.rit.edu>
Reported-by: Pedro Alves <palves@redhat.com>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Added x86 self-test info. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-16 15:40:25 +08:00
|
|
|
stepping = test_thread_flag(TIF_SINGLESTEP);
|
|
|
|
if (stepping)
|
|
|
|
user_disable_single_step(current);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2012-11-10 12:51:47 +08:00
|
|
|
failed = (setup_rt_frame(ksig, regs) < 0);
|
|
|
|
if (!failed) {
|
|
|
|
/*
|
|
|
|
* Clear the direction flag as per the ABI for function entry.
|
2013-05-01 23:25:43 +08:00
|
|
|
*
|
2013-05-01 23:25:42 +08:00
|
|
|
* Clear RF when entering the signal handler, because
|
|
|
|
* it might disable possible debug exception from the
|
|
|
|
* signal handler.
|
2013-05-01 23:25:43 +08:00
|
|
|
*
|
x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
When the TIF_SINGLESTEP tracee dequeues a signal,
handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but
leaves TIF_SINGLESTEP set.
If the tracer does PTRACE_SINGLESTEP again, enable_single_step()
sets X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the
subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the
tracee gets the wrong SIGTRAP.
Test-case (needs -O2 to avoid prologue insns in signal handler):
#include <unistd.h>
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <assert.h>
#include <stddef.h>
void handler(int n)
{
asm("nop");
}
int child(void)
{
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
signal(SIGALRM, handler);
kill(getpid(), SIGALRM);
return 0x23;
}
void *getip(int pid)
{
return (void*)ptrace(PTRACE_PEEKUSER, pid,
offsetof(struct user, regs.rip), 0);
}
int main(void)
{
int pid, status;
pid = fork();
if (!pid)
return child();
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 0);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 1);
assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
assert(wait(&status) == pid);
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
return 0;
}
The last assert() fails because PTRACE_CONT wrongly triggers
another single-step and X86_EFLAGS_TF can't be cleared by
debugger until the tracee does sys_rt_sigreturn().
Change handle_signal() to do user_disable_single_step() if
stepping, we do not need to preserve TIF_SINGLESTEP because we
are going to do ptrace_notify(), and it is simply wrong to leak
this bit.
While at it, change the comment to explain why we also need to
clear TF unconditionally after setup_rt_frame().
Note: in the longer term we should probably change
setup_sigcontext() to use get_flags() and then just remove this
user_disable_single_step(). And, the state of TIF_FORCED_TF can
be wrong after restore_sigcontext() which can set/clear TF, this
needs another fix.
This fix fixes the 'single_step_syscall_32' testcase in
the x86 testsuite:
Before:
~/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[WARN] Hit 10000 SIGTRAPs with si_addr 0xf7789cc0, ip 0xf7789cc0
Trace/breakpoint trap (core dumped)
After:
~/linux/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[OK] Survived with TF set and 39 traps
[RUN] Fast syscall with TF cleared
[OK] Nothing unexpected happened
Reported-by: Evan Teran <eteran@alum.rit.edu>
Reported-by: Pedro Alves <palves@redhat.com>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Added x86 self-test info. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-16 15:40:25 +08:00
|
|
|
* Clear TF for the case when it wasn't set by debugger to
|
|
|
|
* avoid the recursive send_sigtrap() in SIGTRAP handler.
|
2012-11-10 12:51:47 +08:00
|
|
|
*/
|
2013-05-01 23:25:43 +08:00
|
|
|
regs->flags &= ~(X86_EFLAGS_DF|X86_EFLAGS_RF|X86_EFLAGS_TF);
|
x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()
save_xstate_sig()->drop_init_fpu() doesn't look right. setup_rt_frame()
can fail after that, in this case the next setup_rt_frame() triggered
by SIGSEGV won't save fpu simply because the old state was lost. This
obviously mean that fpu won't be restored after sys_rt_sigreturn() from
SIGSEGV handler.
Shift drop_init_fpu() into !failed branch in handle_signal().
Test-case (needs -O2):
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <pthread.h>
#include <assert.h>
volatile double D;
void test(double d)
{
int pid = getpid();
for (D = d; D == d; ) {
/* sys_tkill(pid, SIGHUP); asm to avoid save/reload
* fp regs around "C" call */
asm ("" : : "a"(200), "D"(pid), "S"(1));
asm ("syscall" : : : "ax");
}
printf("ERR!!\n");
}
void sigh(int sig)
{
}
char altstack[4096 * 10] __attribute__((aligned(4096)));
void *tfunc(void *arg)
{
for (;;) {
mprotect(altstack, sizeof(altstack), PROT_READ);
mprotect(altstack, sizeof(altstack), PROT_READ|PROT_WRITE);
}
}
int main(void)
{
stack_t st = {
.ss_sp = altstack,
.ss_size = sizeof(altstack),
.ss_flags = SS_ONSTACK,
};
struct sigaction sa = {
.sa_handler = sigh,
};
pthread_t pt;
sigaction(SIGSEGV, &sa, NULL);
sigaltstack(&st, NULL);
sa.sa_flags = SA_ONSTACK;
sigaction(SIGHUP, &sa, NULL);
pthread_create(&pt, NULL, tfunc, NULL);
test(123.456);
return 0;
}
Reported-by: Bean Anderson <bean@azulsystems.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20140902175713.GA21646@redhat.com
Cc: <stable@kernel.org> # v3.7+
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-03 01:57:13 +08:00
|
|
|
/*
|
|
|
|
* Ensure the signal handler starts with the new fpu state.
|
|
|
|
*/
|
2019-04-04 00:41:36 +08:00
|
|
|
fpu__clear(fpu);
|
2012-05-22 11:42:15 +08:00
|
|
|
}
|
x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
When the TIF_SINGLESTEP tracee dequeues a signal,
handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but
leaves TIF_SINGLESTEP set.
If the tracer does PTRACE_SINGLESTEP again, enable_single_step()
sets X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the
subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the
tracee gets the wrong SIGTRAP.
Test-case (needs -O2 to avoid prologue insns in signal handler):
#include <unistd.h>
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <assert.h>
#include <stddef.h>
void handler(int n)
{
asm("nop");
}
int child(void)
{
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
signal(SIGALRM, handler);
kill(getpid(), SIGALRM);
return 0x23;
}
void *getip(int pid)
{
return (void*)ptrace(PTRACE_PEEKUSER, pid,
offsetof(struct user, regs.rip), 0);
}
int main(void)
{
int pid, status;
pid = fork();
if (!pid)
return child();
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 0);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 1);
assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
assert(wait(&status) == pid);
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
return 0;
}
The last assert() fails because PTRACE_CONT wrongly triggers
another single-step and X86_EFLAGS_TF can't be cleared by
debugger until the tracee does sys_rt_sigreturn().
Change handle_signal() to do user_disable_single_step() if
stepping, we do not need to preserve TIF_SINGLESTEP because we
are going to do ptrace_notify(), and it is simply wrong to leak
this bit.
While at it, change the comment to explain why we also need to
clear TF unconditionally after setup_rt_frame().
Note: in the longer term we should probably change
setup_sigcontext() to use get_flags() and then just remove this
user_disable_single_step(). And, the state of TIF_FORCED_TF can
be wrong after restore_sigcontext() which can set/clear TF, this
needs another fix.
This fix fixes the 'single_step_syscall_32' testcase in
the x86 testsuite:
Before:
~/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[WARN] Hit 10000 SIGTRAPs with si_addr 0xf7789cc0, ip 0xf7789cc0
Trace/breakpoint trap (core dumped)
After:
~/linux/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[OK] Survived with TF set and 39 traps
[RUN] Fast syscall with TF cleared
[OK] Nothing unexpected happened
Reported-by: Evan Teran <eteran@alum.rit.edu>
Reported-by: Pedro Alves <palves@redhat.com>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Added x86 self-test info. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-16 15:40:25 +08:00
|
|
|
signal_setup_done(failed, ksig, stepping);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2015-12-01 05:54:36 +08:00
|
|
|
static inline unsigned long get_nr_restart_syscall(const struct pt_regs *regs)
|
|
|
|
{
|
2016-07-27 14:12:22 +08:00
|
|
|
/*
|
|
|
|
* This function is fundamentally broken as currently
|
|
|
|
* implemented.
|
|
|
|
*
|
|
|
|
* The idea is that we want to trigger a call to the
|
|
|
|
* restart_block() syscall and that we want in_ia32_syscall(),
|
|
|
|
* in_x32_syscall(), etc. to match whatever they were in the
|
|
|
|
* syscall being restarted. We assume that the syscall
|
|
|
|
* instruction at (regs->ip - 2) matches whatever syscall
|
|
|
|
* instruction we used to enter in the first place.
|
|
|
|
*
|
|
|
|
* The problem is that we can get here when ptrace pokes
|
|
|
|
* syscall-like values into regs even if we're not in a syscall
|
|
|
|
* at all.
|
|
|
|
*
|
|
|
|
* For now, we maintain historical behavior and guess based on
|
|
|
|
* stored state. We could do better by saving the actual
|
|
|
|
* syscall arch in restart_block or (with caveats on x32) by
|
|
|
|
* checking if regs->ip points to 'int $0x80'. The current
|
|
|
|
* behavior is incorrect if a tracer has a different bitness
|
|
|
|
* than the tracee.
|
|
|
|
*/
|
|
|
|
#ifdef CONFIG_IA32_EMULATION
|
2018-01-29 02:38:50 +08:00
|
|
|
if (current_thread_info()->status & (TS_COMPAT|TS_I386_REGS_POKED))
|
2015-12-18 07:56:52 +08:00
|
|
|
return __NR_ia32_restart_syscall;
|
|
|
|
#endif
|
|
|
|
#ifdef CONFIG_X86_X32_ABI
|
|
|
|
return __NR_restart_syscall | (regs->orig_ax & __X32_SYSCALL_BIT);
|
|
|
|
#else
|
2015-12-01 05:54:36 +08:00
|
|
|
return __NR_restart_syscall;
|
2015-12-18 07:56:52 +08:00
|
|
|
#endif
|
2015-12-01 05:54:36 +08:00
|
|
|
}
|
2008-10-30 09:46:40 +08:00
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
/*
|
|
|
|
* Note that 'init' is a special process: it doesn't get signals it doesn't
|
|
|
|
* want to handle. Thus you cannot kill init even with a SIGKILL even by
|
|
|
|
* mistake.
|
|
|
|
*/
|
2015-07-04 03:44:23 +08:00
|
|
|
void do_signal(struct pt_regs *regs)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2012-11-10 12:51:47 +08:00
|
|
|
struct ksignal ksig;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2012-11-10 12:51:47 +08:00
|
|
|
if (get_signal(&ksig)) {
|
2008-03-06 17:33:08 +08:00
|
|
|
/* Whee! Actually deliver the signal. */
|
2012-11-10 12:51:47 +08:00
|
|
|
handle_signal(&ksig, regs);
|
2006-01-19 09:44:00 +08:00
|
|
|
return;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Did we come from a system call? */
|
2008-09-06 07:26:55 +08:00
|
|
|
if (syscall_get_nr(current, regs) >= 0) {
|
2005-04-17 06:20:36 +08:00
|
|
|
/* Restart the system call - no handlers present */
|
2008-09-06 07:26:55 +08:00
|
|
|
switch (syscall_get_error(current, regs)) {
|
2006-01-19 09:44:00 +08:00
|
|
|
case -ERESTARTNOHAND:
|
|
|
|
case -ERESTARTSYS:
|
|
|
|
case -ERESTARTNOINTR:
|
2008-01-30 20:30:56 +08:00
|
|
|
regs->ax = regs->orig_ax;
|
|
|
|
regs->ip -= 2;
|
2006-01-19 09:44:00 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
case -ERESTART_RESTARTBLOCK:
|
2015-12-01 05:54:36 +08:00
|
|
|
regs->ax = get_nr_restart_syscall(regs);
|
2008-01-30 20:30:56 +08:00
|
|
|
regs->ip -= 2;
|
2006-01-19 09:44:00 +08:00
|
|
|
break;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
}
|
2006-01-19 09:44:00 +08:00
|
|
|
|
2008-02-09 04:09:58 +08:00
|
|
|
/*
|
|
|
|
* If there's no signal to deliver, we just put the saved sigmask
|
|
|
|
* back.
|
|
|
|
*/
|
2012-05-22 11:33:55 +08:00
|
|
|
restore_saved_sigmask();
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-09-06 07:27:11 +08:00
|
|
|
void signal_fault(struct pt_regs *regs, void __user *frame, char *where)
|
|
|
|
{
|
|
|
|
struct task_struct *me = current;
|
|
|
|
|
|
|
|
if (show_unhandled_signals && printk_ratelimit()) {
|
2008-12-17 06:02:16 +08:00
|
|
|
printk("%s"
|
2008-09-06 07:27:11 +08:00
|
|
|
"%s[%d] bad frame in %s frame:%p ip:%lx sp:%lx orax:%lx",
|
2008-12-17 06:02:16 +08:00
|
|
|
task_pid_nr(current) > 1 ? KERN_INFO : KERN_EMERG,
|
2008-09-06 07:27:11 +08:00
|
|
|
me->comm, me->pid, where, frame,
|
|
|
|
regs->ip, regs->sp, regs->orig_ax);
|
2017-04-07 20:09:04 +08:00
|
|
|
print_vma_addr(KERN_CONT " in ", regs->ip);
|
2012-05-22 10:50:07 +08:00
|
|
|
pr_cont("\n");
|
2008-09-06 07:27:11 +08:00
|
|
|
}
|
|
|
|
|
2019-05-23 23:17:27 +08:00
|
|
|
force_sig(SIGSEGV);
|
2008-09-06 07:27:11 +08:00
|
|
|
}
|
2012-02-20 01:41:09 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_X86_X32_ABI
|
2012-11-13 03:32:42 +08:00
|
|
|
asmlinkage long sys32_x32_rt_sigreturn(void)
|
2012-02-20 01:41:09 +08:00
|
|
|
{
|
2012-11-13 03:32:42 +08:00
|
|
|
struct pt_regs *regs = current_pt_regs();
|
2012-02-20 01:41:09 +08:00
|
|
|
struct rt_sigframe_x32 __user *frame;
|
|
|
|
sigset_t set;
|
x86/signal/64: Re-add support for SS in the 64-bit signal context
This is a second attempt to make the improvements from c6f2062935c8
("x86/signal/64: Fix SS handling for signals delivered to 64-bit
programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
support for SS in the 64-bit signal context").
This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for
all 64-bit signals (including x32). It indicates that the saved SS
field is valid and that the kernel supports the new behavior.
The goal is to fix a problems with signal handling in 64-bit tasks:
SS wasn't saved in the 64-bit signal context, making it awkward to
determine what SS was at the time of signal delivery and making it
impossible to return to a non-flat SS (as calling sigreturn clobbers
SS).
This also made it extremely difficult for 64-bit tasks to return to
fully-defined 16-bit contexts, because only the kernel can easily do
espfix64, but sigreturn was unable to set a non-flag SS:ESP.
(DOSEMU has a monstrous hack to partially work around this
limitation.)
If we could go back in time, the correct fix would be to make 64-bit
signals work just like 32-bit signals with respect to SS: save it
in signal context, reset it when delivering a signal, and restore
it in sigreturn.
Unfortunately, doing that (as I tried originally) breaks DOSEMU:
DOSEMU wouldn't reset the signal context's SS when clearing the LDT
and changing the saved CS to 64-bit mode, since it predates the SS
context field existing in the first place.
This patch is a bit more complicated, and it tries to balance a
bunch of goals. It makes most cases of changing ucontext->ss during
signal handling work as expected.
I do this by special-casing the interesting case. On sigreturn,
ucontext->ss will be honored by default, unless the ucontext was
created from scratch by an old program and had a 64-bit CS
(unfortunately, CRIU can do this) or was the result of changing a
32-bit signal context to 64-bit without resetting SS (as DOSEMU
does).
For the benefit of new 64-bit software that uses segmentation (new
versions of DOSEMU might), the new behavior can be detected with a
new ucontext flag UC_SIGCONTEXT_SS.
To avoid compilation issues, __pad0 is left as an alias for ss in
ucontext.
The nitty-gritty details are documented in the header file.
This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests,
as the kernel change allows both of them to pass.
Tested-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org
[ Small readability edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 07:09:03 +08:00
|
|
|
unsigned long uc_flags;
|
2012-02-20 01:41:09 +08:00
|
|
|
|
|
|
|
frame = (struct rt_sigframe_x32 __user *)(regs->sp - 8);
|
|
|
|
|
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 10:57:57 +08:00
|
|
|
if (!access_ok(frame, sizeof(*frame)))
|
2012-02-20 01:41:09 +08:00
|
|
|
goto badframe;
|
|
|
|
if (__copy_from_user(&set, &frame->uc.uc_sigmask, sizeof(set)))
|
|
|
|
goto badframe;
|
x86/signal/64: Re-add support for SS in the 64-bit signal context
This is a second attempt to make the improvements from c6f2062935c8
("x86/signal/64: Fix SS handling for signals delivered to 64-bit
programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
support for SS in the 64-bit signal context").
This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for
all 64-bit signals (including x32). It indicates that the saved SS
field is valid and that the kernel supports the new behavior.
The goal is to fix a problems with signal handling in 64-bit tasks:
SS wasn't saved in the 64-bit signal context, making it awkward to
determine what SS was at the time of signal delivery and making it
impossible to return to a non-flat SS (as calling sigreturn clobbers
SS).
This also made it extremely difficult for 64-bit tasks to return to
fully-defined 16-bit contexts, because only the kernel can easily do
espfix64, but sigreturn was unable to set a non-flag SS:ESP.
(DOSEMU has a monstrous hack to partially work around this
limitation.)
If we could go back in time, the correct fix would be to make 64-bit
signals work just like 32-bit signals with respect to SS: save it
in signal context, reset it when delivering a signal, and restore
it in sigreturn.
Unfortunately, doing that (as I tried originally) breaks DOSEMU:
DOSEMU wouldn't reset the signal context's SS when clearing the LDT
and changing the saved CS to 64-bit mode, since it predates the SS
context field existing in the first place.
This patch is a bit more complicated, and it tries to balance a
bunch of goals. It makes most cases of changing ucontext->ss during
signal handling work as expected.
I do this by special-casing the interesting case. On sigreturn,
ucontext->ss will be honored by default, unless the ucontext was
created from scratch by an old program and had a 64-bit CS
(unfortunately, CRIU can do this) or was the result of changing a
32-bit signal context to 64-bit without resetting SS (as DOSEMU
does).
For the benefit of new 64-bit software that uses segmentation (new
versions of DOSEMU might), the new behavior can be detected with a
new ucontext flag UC_SIGCONTEXT_SS.
To avoid compilation issues, __pad0 is left as an alias for ss in
ucontext.
The nitty-gritty details are documented in the header file.
This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests,
as the kernel change allows both of them to pass.
Tested-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org
[ Small readability edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 07:09:03 +08:00
|
|
|
if (__get_user(uc_flags, &frame->uc.uc_flags))
|
|
|
|
goto badframe;
|
2012-02-20 01:41:09 +08:00
|
|
|
|
|
|
|
set_current_blocked(&set);
|
|
|
|
|
x86/signal/64: Re-add support for SS in the 64-bit signal context
This is a second attempt to make the improvements from c6f2062935c8
("x86/signal/64: Fix SS handling for signals delivered to 64-bit
programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
support for SS in the 64-bit signal context").
This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for
all 64-bit signals (including x32). It indicates that the saved SS
field is valid and that the kernel supports the new behavior.
The goal is to fix a problems with signal handling in 64-bit tasks:
SS wasn't saved in the 64-bit signal context, making it awkward to
determine what SS was at the time of signal delivery and making it
impossible to return to a non-flat SS (as calling sigreturn clobbers
SS).
This also made it extremely difficult for 64-bit tasks to return to
fully-defined 16-bit contexts, because only the kernel can easily do
espfix64, but sigreturn was unable to set a non-flag SS:ESP.
(DOSEMU has a monstrous hack to partially work around this
limitation.)
If we could go back in time, the correct fix would be to make 64-bit
signals work just like 32-bit signals with respect to SS: save it
in signal context, reset it when delivering a signal, and restore
it in sigreturn.
Unfortunately, doing that (as I tried originally) breaks DOSEMU:
DOSEMU wouldn't reset the signal context's SS when clearing the LDT
and changing the saved CS to 64-bit mode, since it predates the SS
context field existing in the first place.
This patch is a bit more complicated, and it tries to balance a
bunch of goals. It makes most cases of changing ucontext->ss during
signal handling work as expected.
I do this by special-casing the interesting case. On sigreturn,
ucontext->ss will be honored by default, unless the ucontext was
created from scratch by an old program and had a 64-bit CS
(unfortunately, CRIU can do this) or was the result of changing a
32-bit signal context to 64-bit without resetting SS (as DOSEMU
does).
For the benefit of new 64-bit software that uses segmentation (new
versions of DOSEMU might), the new behavior can be detected with a
new ucontext flag UC_SIGCONTEXT_SS.
To avoid compilation issues, __pad0 is left as an alias for ss in
ucontext.
The nitty-gritty details are documented in the header file.
This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests,
as the kernel change allows both of them to pass.
Tested-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org
[ Small readability edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 07:09:03 +08:00
|
|
|
if (restore_sigcontext(regs, &frame->uc.uc_mcontext, uc_flags))
|
2012-02-20 01:41:09 +08:00
|
|
|
goto badframe;
|
|
|
|
|
2012-12-15 03:47:53 +08:00
|
|
|
if (compat_restore_altstack(&frame->uc.uc_stack))
|
2012-02-20 01:41:09 +08:00
|
|
|
goto badframe;
|
|
|
|
|
2015-04-04 20:58:23 +08:00
|
|
|
return regs->ax;
|
2012-02-20 01:41:09 +08:00
|
|
|
|
|
|
|
badframe:
|
|
|
|
signal_fault(regs, frame, "x32 rt_sigreturn");
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#endif
|